mirror of
https://github.com/stellarshenson/stellars-jupyterhub-ds.git
synced 2026-03-08 06:00:29 +00:00
feat: add project documentation, feature plan, and version management
- Add .claude/CLAUDE.md with comprehensive architecture documentation - Add .claude/JOURNAL.md for tracking substantive work - Add FEATURE_PLAN.md for Reset Home Volume and Restart Server features - Add project.env with version tracking (1.0.0_jh-4.x) - Update Makefile with increment_version and tag targets - Implement auto-versioning on build and dual-tag push workflow
This commit is contained in:
178
.claude/CLAUDE.md
Normal file
178
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
Multi-user JupyterHub 4 deployment platform with data science stack, GPU support (auto-detection), and NativeAuthenticator. The platform spawns isolated JupyterLab environments per user using DockerSpawner, backed by the `stellars/stellars-jupyterlab-ds` image.
|
||||
|
||||
**Architecture**: Docker Compose orchestrates three main services:
|
||||
- **Traefik**: Reverse proxy handling TLS termination and routing (ports 80, 443, 8080)
|
||||
- **JupyterHub**: Central hub managing user authentication and spawning user containers
|
||||
- **Watchtower**: Automatic image updates (daily at midnight)
|
||||
|
||||
User containers are dynamically spawned into the `jupyterhub_network` with per-user persistent volumes for home, workspace, and cache directories.
|
||||
|
||||
## Common Development Commands
|
||||
|
||||
### Building and Deployment
|
||||
|
||||
```bash
|
||||
# Build the JupyterHub container
|
||||
make build
|
||||
|
||||
# Build with verbose output
|
||||
make build_verbose
|
||||
|
||||
# Build using script directly
|
||||
./scripts/build.sh
|
||||
|
||||
# Pull latest image from DockerHub
|
||||
make pull
|
||||
|
||||
# Push image to DockerHub
|
||||
make push
|
||||
```
|
||||
|
||||
### Starting and Stopping
|
||||
|
||||
```bash
|
||||
# Start platform (detached mode, respects compose_override.yml if present)
|
||||
./start.sh
|
||||
|
||||
# Start with docker compose directly
|
||||
docker compose --env-file .env -f compose.yml up --no-recreate --no-build -d
|
||||
|
||||
# Start with override file
|
||||
docker compose --env-file .env -f compose.yml -f compose_override.yml up --no-recreate --no-build -d
|
||||
|
||||
# Stop and clean up
|
||||
make clean
|
||||
```
|
||||
|
||||
### Accessing Services
|
||||
|
||||
- JupyterHub: `https://localhost/jupyterhub`
|
||||
- Traefik Dashboard: `http://localhost:8080/dashboard`
|
||||
- First-time setup: Self-register as `admin` user (auto-authorized)
|
||||
|
||||
## Configuration Architecture
|
||||
|
||||
### Primary Configuration: `config/jupyterhub_config.py`
|
||||
|
||||
This Python configuration file controls all JupyterHub behavior:
|
||||
|
||||
**Environment Variables** (set in compose.yml or compose_override.yml):
|
||||
- `JUPYTERHUB_ADMIN`: Admin username (default: `admin`)
|
||||
- `DOCKER_NOTEBOOK_IMAGE`: JupyterLab image to spawn (default: `stellars/stellars-jupyterlab-ds:latest`)
|
||||
- `DOCKER_NETWORK_NAME`: Network for spawned containers (default: `jupyterhub_network`)
|
||||
- `JUPYTERHUB_BASE_URL`: URL prefix (default: `/jupyterhub`)
|
||||
- `ENABLE_GPU_SUPPORT`: GPU mode - `0` (disabled), `1` (enabled), `2` (auto-detect)
|
||||
- `ENABLE_JUPYTERHUB_SSL`: Direct SSL config - `0` (disabled), `1` (enabled)
|
||||
- `ENABLE_SERVICE_MLFLOW`: Enable MLflow tracking (`0`/`1`)
|
||||
- `ENABLE_SERVICE_GLANCES`: Enable resource monitor (`0`/`1`)
|
||||
- `ENABLE_SERVICE_TENSORBOARD`: Enable TensorBoard (`0`/`1`)
|
||||
- `NVIDIA_AUTODETECT_IMAGE`: Image for GPU detection (default: `nvidia/cuda:12.9.1-base-ubuntu24.04`)
|
||||
|
||||
**GPU Auto-Detection**: When `ENABLE_GPU_SUPPORT=2`, the platform attempts to run `nvidia-smi` in a CUDA container. If successful, GPU support is enabled for all spawned user containers via `device_requests`.
|
||||
|
||||
**User Container Configuration**:
|
||||
- Spawned containers use `DockerSpawner` with per-user volumes
|
||||
- Default working directory: `/home/lab/workspace`
|
||||
- Container name pattern: `jupyterlab-{username}`
|
||||
- Persistent volumes:
|
||||
- `jupyterlab-{username}_home`: `/home`
|
||||
- `jupyterlab-{username}_workspace`: `/home/lab/workspace`
|
||||
- `jupyterlab-{username}_cache`: `/home/lab/.cache`
|
||||
- `jupyterhub_shared`: `/mnt/shared` (shared across all users)
|
||||
|
||||
### Override Pattern: `compose_override.yml`
|
||||
|
||||
Create this file to customize deployment without modifying tracked files:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
jupyterhub:
|
||||
volumes:
|
||||
- ./config/jupyterhub_config_override.py:/srv/jupyterhub/jupyterhub_config.py:ro
|
||||
environment:
|
||||
- ENABLE_GPU_SUPPORT=1
|
||||
```
|
||||
|
||||
**IMPORTANT**: `compose_override.yml` contains deployment-specific credentials (CIFS passwords, etc.) and should never be committed.
|
||||
|
||||
### TLS Certificates
|
||||
|
||||
Certificates are auto-generated at startup by `/mkcert.sh` script and stored in `jupyterhub_certs` volume. Traefik reads certificates from `/mnt/certs/certs.yml` configuration file.
|
||||
|
||||
## Docker Image Build Process
|
||||
|
||||
**Dockerfile**: `services/jupyterhub/Dockerfile.jupyterhub`
|
||||
|
||||
Build stages:
|
||||
1. Base image: `jupyterhub/jupyterhub:latest`
|
||||
2. Install system packages from `conf/apt-packages.yml` using `yq` parser
|
||||
3. Copy startup scripts from `conf/bin/` (executable permissions set to 755)
|
||||
4. Install Python packages: `docker`, `dockerspawner`, `jupyterhub-nativeauthenticator`
|
||||
5. Copy certificate templates from `templates/certs/`
|
||||
6. Entrypoint: `/start-platform.sh`
|
||||
|
||||
**Platform Initialization**: `/start-platform.sh` executes scripts in `/start-platform.d/` directory sequentially before launching JupyterHub.
|
||||
|
||||
## Authentication
|
||||
|
||||
**NativeAuthenticator** configuration in `jupyterhub_config.py`:
|
||||
- Self-registration enabled (`c.NativeAuthenticator.enable_signup = True`)
|
||||
- Open signup disabled (`c.NativeAuthenticator.open_signup = False`)
|
||||
- All registered users allowed to login (`c.Authenticator.allow_all = True`)
|
||||
- Admin users defined by `JUPYTERHUB_ADMIN` environment variable
|
||||
- Admin panel access: `https://localhost/jupyterhub/hub/home`
|
||||
|
||||
## Networking and Volumes
|
||||
|
||||
**Networks**:
|
||||
- `jupyterhub_network`: Bridge network connecting hub, Traefik, and spawned user containers
|
||||
|
||||
**Volumes**:
|
||||
- `jupyterhub_data`: Persistent database (`jupyterhub.sqlite`) and cookie secrets
|
||||
- `jupyterhub_certs`: TLS certificates shared with Traefik
|
||||
- `jupyterhub_shared`: Shared storage across all user environments (can be mounted as CIFS)
|
||||
- Per-user volumes: Created dynamically on first spawn
|
||||
|
||||
## CIFS/NAS Integration
|
||||
|
||||
To mount network storage in user containers, override the `jupyterhub_shared` volume in `compose_override.yml`:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
jupyterhub_shared:
|
||||
driver: local
|
||||
name: jupyterhub_shared
|
||||
driver_opts:
|
||||
type: cifs
|
||||
device: //nas_ip/share_name
|
||||
o: username=xxx,password=yyy,uid=1000,gid=1000
|
||||
```
|
||||
|
||||
User containers will access this at `/mnt/shared`.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**GPU not detected**:
|
||||
- Verify NVIDIA Docker runtime: `docker run --rm --gpus all nvidia/cuda:12.9.1-base-ubuntu24.04 nvidia-smi`
|
||||
- Check `NVIDIA_AUTODETECT_IMAGE` matches your CUDA version
|
||||
- Manually enable with `ENABLE_GPU_SUPPORT=1`
|
||||
|
||||
**Container spawn failures**:
|
||||
- Check Docker socket permissions: `/var/run/docker.sock` must be accessible
|
||||
- Verify network exists: `docker network inspect jupyterhub_network`
|
||||
- Review logs: `docker logs <container-name>`
|
||||
|
||||
**Authentication issues**:
|
||||
- Admin user must match `JUPYTERHUB_ADMIN` environment variable
|
||||
- Database persisted in `jupyterhub_data` volume - may need reset if corrupted
|
||||
- Cookie secret persisted in `/data/jupyterhub_cookie_secret`
|
||||
|
||||
## Related Projects
|
||||
|
||||
User environments spawned from: https://github.com/stellarshenson/stellars-jupyterlab-ds
|
||||
@@ -6,3 +6,12 @@ This journal tracks substantive work on documents, diagrams, and documentation c
|
||||
|
||||
1. **Task - Add Docker badges**: added Docker pulls and GitHub stars badges to README.md<br>
|
||||
**Result**: README now displays Docker pulls badge (stellars/stellars-jupyterhub-ds), Docker image size badge, and GitHub stars badge
|
||||
|
||||
2. **Task - Project initialization and documentation**: Analyzed codebase and created comprehensive project documentation<br>
|
||||
**Result**: Created `.claude/CLAUDE.md` with detailed architecture overview, configuration patterns, common commands, GPU auto-detection logic, volume management, authentication setup, and troubleshooting guide for future Claude Code instances
|
||||
|
||||
3. **Task - Feature planning for user controls**: Designed two self-service features for JupyterHub user control panel<br>
|
||||
**Result**: Created `FEATURE_PLAN.md` documenting Reset Home Volume and Restart Server features with implementation details, API handlers, UI templates, JavaScript integration, security considerations, edge cases, testing plans, and rollout strategy
|
||||
|
||||
4. **Task - Version management implementation**: Added version tracking and tagging system matching stellars-jupyterlab-ds pattern<br>
|
||||
**Result**: Created `project.env` with project metadata and version 1.0.0_jh-4.x, updated `Makefile` with increment_version and tag targets, auto-increment on build, dual-tag push (latest and versioned), leveraging existing Docker socket access for both planned features
|
||||
|
||||
654
FEATURE_PLAN.md
Normal file
654
FEATURE_PLAN.md
Normal file
@@ -0,0 +1,654 @@
|
||||
# Feature Plan: User Control Panel Enhancements
|
||||
|
||||
## Overview
|
||||
|
||||
Enhance JupyterHub user control panel with two self-service features:
|
||||
1. **Reset Home Volume**: Allow users to reset their home directory volume when server is stopped
|
||||
2. **Restart Server**: Provide one-click server restart functionality
|
||||
|
||||
Both features include confirmation dialogs and proper permission enforcement.
|
||||
|
||||
## Feature Scope
|
||||
|
||||
### Feature 1: Reset Home Volume
|
||||
|
||||
**Access Control**:
|
||||
- Users can reset their own home volume
|
||||
- Admins can reset any user's home volume
|
||||
|
||||
**Volume Scope**:
|
||||
- Only `jupyterlab-{username}_home` volume
|
||||
- Does NOT affect workspace (`jupyterlab-{username}_workspace`) or cache (`jupyterlab-{username}_cache`) volumes
|
||||
|
||||
**UI Location**:
|
||||
- User control panel (accessible to both user and admin)
|
||||
- Button visible only when server is stopped
|
||||
|
||||
### Feature 2: Restart Server
|
||||
|
||||
**Access Control**:
|
||||
- Users can restart their own server
|
||||
- Admins can restart any user's server
|
||||
|
||||
**Functionality**:
|
||||
- Uses Docker's native container restart (preserves container, does NOT recreate)
|
||||
- Performs graceful restart with configurable timeout
|
||||
- Maintains all volumes, network connections, and container configuration
|
||||
- Equivalent to "Restart" button in Docker Desktop
|
||||
|
||||
**UI Location**:
|
||||
- User control panel (accessible to both user and admin)
|
||||
- Button visible only when server is running
|
||||
|
||||
**Technical Approach**:
|
||||
- Direct Docker API call: `container.restart(timeout=10)`
|
||||
- Does NOT use JupyterHub's `stop()` and `spawn()` methods (which would recreate container)
|
||||
- Container ID remains the same after restart
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### Prerequisites
|
||||
- User's JupyterLab server must be stopped (for reset volume)
|
||||
- Volume `jupyterlab-{username}_home` must exist (for reset volume)
|
||||
- **Docker socket accessible at `/var/run/docker.sock`** (already configured in `compose.yml` line 54 with read-write access)
|
||||
- Docker Python SDK available (already installed in `Dockerfile.jupyterhub`)
|
||||
|
||||
### Existing Infrastructure Leveraged
|
||||
Both features utilize infrastructure already in place:
|
||||
- **Docker Socket**: Mounted at `/var/run/docker.sock:rw` for DockerSpawner, we reuse this for volume management and container restart
|
||||
- **Docker Python SDK**: Already installed via `pip install docker` in the JupyterHub image
|
||||
- **Container Naming Pattern**: Follows existing convention `jupyterlab-{username}` from `jupyterhub_config.py` line 112
|
||||
- **Volume Naming Pattern**: Follows existing convention `jupyterlab-{username}_home` from `jupyterhub_config.py` line 116
|
||||
|
||||
### Permission Model
|
||||
- **User access**: Can only reset their own home volume
|
||||
- **Admin access**: Can reset any user's home volume
|
||||
- Implemented via custom decorator: `@admin_or_self`
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### 1. Create Custom API Handler
|
||||
|
||||
**File**: `services/jupyterhub/conf/bin/volume_handler.py` (or inline in `config/jupyterhub_config.py`)
|
||||
|
||||
**Purpose**: Handle volume reset requests via REST API
|
||||
|
||||
**Endpoint**: `DELETE /hub/api/users/{username}/reset-home-volume`
|
||||
|
||||
**Logic**:
|
||||
```python
|
||||
from jupyterhub.handlers import BaseHandler
|
||||
from jupyterhub.utils import admin_or_self
|
||||
import docker
|
||||
|
||||
class ResetHomeVolumeHandler(BaseHandler):
|
||||
@admin_or_self
|
||||
async def delete(self, username):
|
||||
# 1. Verify user exists
|
||||
user = self.find_user(username)
|
||||
if not user:
|
||||
return self.send_error(404, "User not found")
|
||||
|
||||
# 2. Check server is stopped
|
||||
spawner = user.spawner
|
||||
if spawner.active:
|
||||
return self.send_error(400, "Server must be stopped before resetting volume")
|
||||
|
||||
# 3. Connect to Docker
|
||||
docker_client = docker.DockerClient(base_url='unix://var/run/docker.sock')
|
||||
|
||||
# 4. Verify volume exists
|
||||
volume_name = f'jupyterlab-{username}_home'
|
||||
try:
|
||||
volume = docker_client.volumes.get(volume_name)
|
||||
except docker.errors.NotFound:
|
||||
return self.send_error(404, f"Volume {volume_name} not found")
|
||||
|
||||
# 5. Remove volume
|
||||
try:
|
||||
volume.remove()
|
||||
self.set_status(200)
|
||||
self.finish({"message": f"Volume {volume_name} successfully reset"})
|
||||
except docker.errors.APIError as e:
|
||||
return self.send_error(500, f"Failed to remove volume: {str(e)}")
|
||||
```
|
||||
|
||||
**Error Handling**:
|
||||
- 404: User not found or volume doesn't exist
|
||||
- 400: Server still running
|
||||
- 500: Docker API error (volume in use, permission denied)
|
||||
|
||||
### 2. Register API Handler
|
||||
|
||||
**File**: `config/jupyterhub_config.py`
|
||||
|
||||
Add handler registration:
|
||||
```python
|
||||
from volume_handler import ResetHomeVolumeHandler
|
||||
|
||||
c.JupyterHub.extra_handlers = [
|
||||
(r'/api/users/([^/]+)/reset-home-volume', ResetHomeVolumeHandler),
|
||||
]
|
||||
```
|
||||
|
||||
### 3. Extend User Control Panel Template
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html` (override default template)
|
||||
|
||||
**Template Structure**:
|
||||
- Extend JupyterHub's base `home.html` template
|
||||
- Add "Reset Home Volume" button in server controls section
|
||||
- Button states:
|
||||
- Enabled: Server stopped AND volume exists
|
||||
- Disabled: Server running OR volume doesn't exist
|
||||
- Tooltip explaining current state
|
||||
|
||||
**Button HTML**:
|
||||
```html
|
||||
{% if not user.server %}
|
||||
<button id="reset-home-volume-btn"
|
||||
class="btn btn-danger btn-sm"
|
||||
data-username="{{ user.name }}"
|
||||
data-toggle="modal"
|
||||
data-target="#reset-volume-modal">
|
||||
<i class="fa fa-trash"></i> Reset Home Volume
|
||||
</button>
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
### 4. Create Confirmation Modal
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html` (inline modal)
|
||||
|
||||
**Modal Content**:
|
||||
```html
|
||||
<div class="modal fade" id="reset-volume-modal" tabindex="-1" role="dialog">
|
||||
<div class="modal-dialog" role="document">
|
||||
<div class="modal-content">
|
||||
<div class="modal-header">
|
||||
<h5 class="modal-title">Reset Home Volume</h5>
|
||||
<button type="button" class="close" data-dismiss="modal">×</button>
|
||||
</div>
|
||||
<div class="modal-body">
|
||||
<div class="alert alert-danger">
|
||||
<strong>Warning:</strong> This action cannot be undone!
|
||||
</div>
|
||||
<p>This will permanently delete all files in your home directory:</p>
|
||||
<code id="volume-name-display">jupyterlab-{username}_home</code>
|
||||
<p class="mt-3">Your workspace and cache volumes will NOT be affected.</p>
|
||||
<p><strong>Are you sure you want to continue?</strong></p>
|
||||
</div>
|
||||
<div class="modal-footer">
|
||||
<button type="button" class="btn btn-secondary" data-dismiss="modal">Cancel</button>
|
||||
<button type="button" class="btn btn-danger" id="confirm-reset-btn">
|
||||
Yes, Reset Home Volume
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
### 5. Implement Client-Side JavaScript
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html` (inline script)
|
||||
|
||||
**Functionality**:
|
||||
- Check server status and volume existence on page load
|
||||
- Enable/disable reset button based on state
|
||||
- Handle modal confirmation
|
||||
- Make API call to reset endpoint
|
||||
- Display success/error notifications
|
||||
|
||||
**JavaScript Logic**:
|
||||
```javascript
|
||||
<script>
|
||||
$(document).ready(function() {
|
||||
const username = "{{ user.name }}";
|
||||
|
||||
// Update volume name in modal
|
||||
$('#volume-name-display').text(`jupyterlab-${username}_home`);
|
||||
|
||||
// Confirm reset handler
|
||||
$('#confirm-reset-btn').on('click', function() {
|
||||
const apiUrl = `/hub/api/users/${username}/reset-home-volume`;
|
||||
|
||||
$.ajax({
|
||||
url: apiUrl,
|
||||
type: 'DELETE',
|
||||
headers: {
|
||||
'Authorization': 'token ' + window.jhdata.api_token
|
||||
},
|
||||
success: function(response) {
|
||||
$('#reset-volume-modal').modal('hide');
|
||||
alert('Home volume successfully reset. Your home directory will be recreated on next server start.');
|
||||
location.reload();
|
||||
},
|
||||
error: function(xhr) {
|
||||
$('#reset-volume-modal').modal('hide');
|
||||
const errorMsg = xhr.responseJSON?.message || 'Failed to reset volume';
|
||||
alert(`Error: ${errorMsg}`);
|
||||
}
|
||||
});
|
||||
});
|
||||
});
|
||||
</script>
|
||||
```
|
||||
|
||||
### 6. Update Docker Configuration
|
||||
|
||||
**No changes required**:
|
||||
- Docker Python SDK already installed in `Dockerfile.jupyterhub`
|
||||
- Docker socket already mounted in `compose.yml` (line 54)
|
||||
- Existing Docker client code in `jupyterhub_config.py` can be referenced
|
||||
|
||||
---
|
||||
|
||||
## Feature 2: Restart Server Implementation
|
||||
|
||||
### 1. Create Restart Server API Handler
|
||||
|
||||
**File**: `config/jupyterhub_config.py` (inline with volume handler)
|
||||
|
||||
**Purpose**: Handle server restart requests via REST API
|
||||
|
||||
**Endpoint**: `POST /hub/api/users/{username}/restart-server`
|
||||
|
||||
**Logic**:
|
||||
```python
|
||||
from jupyterhub.handlers import BaseHandler
|
||||
from jupyterhub.utils import admin_or_self
|
||||
import docker
|
||||
|
||||
class RestartServerHandler(BaseHandler):
|
||||
@admin_or_self
|
||||
async def post(self, username):
|
||||
# 1. Verify user exists
|
||||
user = self.find_user(username)
|
||||
if not user:
|
||||
return self.send_error(404, "User not found")
|
||||
|
||||
# 2. Check server is running
|
||||
spawner = user.spawner
|
||||
if not spawner.active:
|
||||
return self.send_error(400, "Server is not running")
|
||||
|
||||
# 3. Get container name from spawner
|
||||
container_name = f'jupyterlab-{username}'
|
||||
|
||||
# 4. Connect to Docker and restart container
|
||||
docker_client = docker.DockerClient(base_url='unix://var/run/docker.sock')
|
||||
|
||||
try:
|
||||
# Get the container
|
||||
container = docker_client.containers.get(container_name)
|
||||
|
||||
# Restart the container (graceful restart with 10s timeout)
|
||||
container.restart(timeout=10)
|
||||
|
||||
self.set_status(200)
|
||||
self.finish({"message": f"Container {container_name} successfully restarted"})
|
||||
except docker.errors.NotFound:
|
||||
return self.send_error(404, f"Container {container_name} not found")
|
||||
except docker.errors.APIError as e:
|
||||
return self.send_error(500, f"Failed to restart container: {str(e)}")
|
||||
```
|
||||
|
||||
**Error Handling**:
|
||||
- 404: User not found or container doesn't exist
|
||||
- 400: Server not running (spawner not active)
|
||||
- 500: Docker API error during restart
|
||||
|
||||
### 2. Register Restart Handler
|
||||
|
||||
**File**: `config/jupyterhub_config.py`
|
||||
|
||||
Update handler registration:
|
||||
```python
|
||||
from volume_handler import ResetHomeVolumeHandler, RestartServerHandler
|
||||
|
||||
c.JupyterHub.extra_handlers = [
|
||||
(r'/api/users/([^/]+)/reset-home-volume', ResetHomeVolumeHandler),
|
||||
(r'/api/users/([^/]+)/restart-server', RestartServerHandler),
|
||||
]
|
||||
```
|
||||
|
||||
### 3. Add Restart Button to Template
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html`
|
||||
|
||||
**Button HTML** (add next to existing server controls):
|
||||
```html
|
||||
{% if user.server %}
|
||||
<button id="restart-server-btn"
|
||||
class="btn btn-warning btn-sm"
|
||||
data-username="{{ user.name }}"
|
||||
data-toggle="modal"
|
||||
data-target="#restart-server-modal">
|
||||
<i class="fa fa-refresh"></i> Restart Server
|
||||
</button>
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
### 4. Create Restart Confirmation Modal
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html`
|
||||
|
||||
**Modal Content**:
|
||||
```html
|
||||
<div class="modal fade" id="restart-server-modal" tabindex="-1" role="dialog">
|
||||
<div class="modal-dialog" role="document">
|
||||
<div class="modal-content">
|
||||
<div class="modal-header">
|
||||
<h5 class="modal-title">Restart Server</h5>
|
||||
<button type="button" class="close" data-dismiss="modal">×</button>
|
||||
</div>
|
||||
<div class="modal-body">
|
||||
<div class="alert alert-warning">
|
||||
<strong>Notice:</strong> Your server will be temporarily unavailable during restart.
|
||||
</div>
|
||||
<p>This will restart your JupyterLab container using Docker's native restart:</p>
|
||||
<ul>
|
||||
<li>Gracefully stops the container</li>
|
||||
<li>Restarts the same container (does not recreate)</li>
|
||||
<li>Preserves all volumes and configuration</li>
|
||||
</ul>
|
||||
<p class="mt-3"><strong>Any unsaved work in notebooks will be lost.</strong></p>
|
||||
<p class="mt-2">Your files on disk are safe and will remain intact.</p>
|
||||
<p>Are you sure you want to restart?</p>
|
||||
</div>
|
||||
<div class="modal-footer">
|
||||
<button type="button" class="btn btn-secondary" data-dismiss="modal">Cancel</button>
|
||||
<button type="button" class="btn btn-warning" id="confirm-restart-btn">
|
||||
Yes, Restart Server
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
### 5. Implement Restart JavaScript
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html` (add to existing script)
|
||||
|
||||
**JavaScript Logic**:
|
||||
```javascript
|
||||
// Restart server handler
|
||||
$('#confirm-restart-btn').on('click', function() {
|
||||
const username = "{{ user.name }}";
|
||||
const apiUrl = `/hub/api/users/${username}/restart-server`;
|
||||
|
||||
// Disable button and show loading state
|
||||
$('#confirm-restart-btn').prop('disabled', true).text('Restarting...');
|
||||
|
||||
$.ajax({
|
||||
url: apiUrl,
|
||||
type: 'POST',
|
||||
headers: {
|
||||
'Authorization': 'token ' + window.jhdata.api_token
|
||||
},
|
||||
success: function(response) {
|
||||
$('#restart-server-modal').modal('hide');
|
||||
alert('Server successfully restarted. Redirecting to your server...');
|
||||
// Redirect to user's server
|
||||
window.location.href = `/user/${username}/lab`;
|
||||
},
|
||||
error: function(xhr) {
|
||||
$('#restart-server-modal').modal('hide');
|
||||
const errorMsg = xhr.responseJSON?.message || 'Failed to restart server';
|
||||
alert(`Error: ${errorMsg}`);
|
||||
$('#confirm-restart-btn').prop('disabled', false).text('Yes, Restart Server');
|
||||
}
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### 6. Enhanced Status Polling (Optional)
|
||||
|
||||
**File**: `services/jupyterhub/templates/home.html`
|
||||
|
||||
Add polling to detect when restart completes:
|
||||
```javascript
|
||||
function pollServerStatus(username) {
|
||||
const interval = setInterval(function() {
|
||||
$.ajax({
|
||||
url: `/hub/api/users/${username}`,
|
||||
type: 'GET',
|
||||
headers: {
|
||||
'Authorization': 'token ' + window.jhdata.api_token
|
||||
},
|
||||
success: function(data) {
|
||||
if (data.server && data.server.ready) {
|
||||
clearInterval(interval);
|
||||
window.location.href = `/user/${username}/lab`;
|
||||
}
|
||||
}
|
||||
});
|
||||
}, 2000); // Poll every 2 seconds
|
||||
|
||||
// Timeout after 60 seconds
|
||||
setTimeout(function() {
|
||||
clearInterval(interval);
|
||||
}, 60000);
|
||||
}
|
||||
```
|
||||
|
||||
## Files to Create/Modify
|
||||
|
||||
### New Files
|
||||
- `services/jupyterhub/templates/home.html` - Custom user control panel template with both features
|
||||
|
||||
### Modified Files
|
||||
- `config/jupyterhub_config.py` - Register API handlers, add volume reset and restart server handler classes
|
||||
- `services/jupyterhub/Dockerfile.jupyterhub` - No changes needed (Docker SDK already installed)
|
||||
|
||||
### Optional Separate Files
|
||||
- `services/jupyterhub/conf/bin/volume_handler.py` - API handler logic for both features (can be inline in config instead)
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Unit Tests
|
||||
|
||||
#### Reset Home Volume Tests
|
||||
1. **API Handler Tests**:
|
||||
- Test permission enforcement (user can only reset own volume)
|
||||
- Test admin can reset any user's volume
|
||||
- Test rejection when server is running
|
||||
- Test volume not found error handling
|
||||
- Test Docker API error handling
|
||||
|
||||
2. **Volume Operations Tests**:
|
||||
- Create test volume
|
||||
- Verify volume exists check
|
||||
- Verify volume removal
|
||||
- Test volume in use scenario
|
||||
|
||||
#### Restart Server Tests
|
||||
1. **API Handler Tests**:
|
||||
- Test permission enforcement (user can only restart own server)
|
||||
- Test admin can restart any user's server
|
||||
- Test rejection when server is not running
|
||||
- Test stop operation failure handling
|
||||
- Test start operation failure handling
|
||||
|
||||
2. **Server Operations Tests**:
|
||||
- Verify server status check (running/stopped)
|
||||
- Test graceful shutdown
|
||||
- Test server restart sequence
|
||||
- Test concurrent restart requests
|
||||
|
||||
### Integration Tests
|
||||
|
||||
#### Reset Home Volume Tests
|
||||
1. **UI Flow Tests**:
|
||||
- Button appears only when server stopped
|
||||
- Modal displays correct volume name
|
||||
- Confirmation triggers API call
|
||||
- Success notification displays
|
||||
- Error handling for failed API calls
|
||||
|
||||
2. **End-to-End Tests**:
|
||||
- User stops server
|
||||
- User clicks reset button
|
||||
- User confirms in modal
|
||||
- Volume is removed
|
||||
- User starts server (new volume created)
|
||||
- Verify clean home directory
|
||||
|
||||
#### Restart Server Tests
|
||||
1. **UI Flow Tests**:
|
||||
- Button appears only when server running
|
||||
- Modal displays proper warning
|
||||
- Confirmation triggers API call
|
||||
- Loading state during restart
|
||||
- Redirect to server after restart
|
||||
- Error handling for failed restart
|
||||
|
||||
2. **End-to-End Tests**:
|
||||
- User has running server
|
||||
- User clicks restart button
|
||||
- User confirms in modal
|
||||
- Server stops gracefully
|
||||
- Server starts automatically
|
||||
- User redirected to new server instance
|
||||
- Verify server is functional after restart
|
||||
|
||||
#### Combined Features Tests
|
||||
1. **Button State Management**:
|
||||
- Reset button visible when server stopped
|
||||
- Restart button visible when server running
|
||||
- Both buttons never visible simultaneously
|
||||
- Button states update after operations
|
||||
|
||||
2. **Workflow Tests**:
|
||||
- Restart server -> works normally
|
||||
- Stop server -> Reset volume -> Start server -> verify clean home
|
||||
- Reset volume -> Start server -> Restart server -> verify functionality
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Reset Home Volume
|
||||
1. **Permission Validation**: Always verify user has permission to reset volume (own volume or admin)
|
||||
2. **Server State Check**: Prevent volume deletion while container is running
|
||||
3. **Volume Ownership**: Validate volume name matches expected pattern `jupyterlab-{username}_home`
|
||||
4. **Docker Socket Access**: Limit Docker operations to volume management only
|
||||
5. **Input Validation**: Sanitize username parameter to prevent injection attacks
|
||||
6. **Audit Logging**: Log all volume reset operations with username and timestamp
|
||||
|
||||
### Restart Server
|
||||
1. **Permission Validation**: Verify user can only restart own server (or is admin)
|
||||
2. **State Validation**: Ensure server is actually running before attempting restart
|
||||
3. **Resource Limits**: Prevent restart request flooding (rate limiting)
|
||||
4. **Graceful Shutdown**: Allow proper cleanup before forced termination
|
||||
5. **Session Integrity**: Invalidate old server tokens after restart
|
||||
6. **Audit Logging**: Log all restart operations with username, timestamp, and outcome
|
||||
|
||||
### Both Features
|
||||
1. **CSRF Protection**: All API endpoints must validate CSRF tokens
|
||||
2. **Authentication**: Require valid JupyterHub session token
|
||||
3. **Authorization**: Implement `@admin_or_self` decorator consistently
|
||||
4. **Rate Limiting**: Prevent abuse through repeated operations
|
||||
5. **Error Disclosure**: Don't expose internal system details in error messages
|
||||
|
||||
## Edge Cases
|
||||
|
||||
### Reset Home Volume
|
||||
1. **Volume doesn't exist**: Display informative error, don't fail silently
|
||||
2. **Server starting/stopping**: Disable button during transition states
|
||||
3. **Volume in use by orphaned container**: Attempt force removal or display cleanup instructions
|
||||
4. **Multiple concurrent reset requests**: Implement request locking/queuing
|
||||
5. **Admin resetting admin's volume**: Require additional confirmation
|
||||
6. **Network errors during API call**: Display retry option
|
||||
7. **Volume has active snapshots/backups**: Check for dependencies before removal
|
||||
|
||||
### Restart Server
|
||||
1. **Server not responding**: Implement timeout and force stop if graceful shutdown fails
|
||||
2. **Restart during server startup**: Queue restart request until server is fully running
|
||||
3. **Container stuck in stopping state**: Detect and handle orphaned containers
|
||||
4. **Multiple concurrent restart requests**: Prevent duplicate restarts with request locking
|
||||
5. **Restart fails to start**: Display error and provide manual start option
|
||||
6. **User opens multiple tabs**: Synchronize state across browser tabs
|
||||
7. **Network interruption during restart**: Handle client-side timeout gracefully
|
||||
|
||||
### Combined Features
|
||||
1. **Rapid operation switching**: User stops -> resets -> starts -> restarts quickly
|
||||
2. **Session expires during operation**: Re-authenticate and resume or show clear error
|
||||
3. **Hub restart during user operation**: Handle hub unavailability gracefully
|
||||
4. **Docker daemon unavailable**: Detect and display system-level error message
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Reset Home Volume
|
||||
1. **Backup before reset**: Create automatic backup to `jupyterhub_shared` before deletion
|
||||
2. **Selective reset**: Allow resetting workspace or cache volumes individually
|
||||
3. **Reset all volumes**: Single action to reset home, workspace, and cache
|
||||
4. **Volume size display**: Show current volume size before reset
|
||||
5. **Reset history**: Log of volume reset operations per user
|
||||
6. **Scheduled resets**: Allow users to schedule periodic volume resets
|
||||
7. **Template volumes**: Pre-populate new volumes with template files
|
||||
8. **Email notification**: Send confirmation email after volume reset
|
||||
|
||||
### Restart Server
|
||||
1. **Scheduled restarts**: Allow users to schedule regular server restarts
|
||||
2. **Restart with options**: Choose specific image version or resource limits
|
||||
3. **Pre-restart save**: Automatically save all open notebooks before restart
|
||||
4. **Restart notifications**: WebSocket-based real-time status updates
|
||||
5. **Restart analytics**: Track restart frequency and success rates per user
|
||||
6. **Soft restart**: Restart JupyterLab without container restart (when possible)
|
||||
7. **Batch restart**: Admin can restart multiple user servers simultaneously
|
||||
8. **Auto-restart on failure**: Automatically restart server if it crashes
|
||||
|
||||
### Combined Features
|
||||
1. **Workflow presets**: "Clean slate" button that resets volume and restarts server
|
||||
2. **Operation queue**: Queue multiple operations (stop, reset, restart) in sequence
|
||||
3. **Health checks**: Automatic server health monitoring with auto-restart option
|
||||
4. **Resource optimization**: Suggest restart when server uses excessive resources
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **JupyterHub**: 4.x (current base image: `jupyterhub/jupyterhub:latest`)
|
||||
- **Docker Python SDK**: Already installed via pip
|
||||
- **NativeAuthenticator**: Already configured for user management
|
||||
- **Bootstrap**: Available in JupyterHub default templates for modal styling
|
||||
- **jQuery**: Available in JupyterHub default templates for AJAX calls
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
1. **Development**: Implement on local environment
|
||||
- Feature 1: Reset Home Volume (priority: high)
|
||||
- Feature 2: Restart Server (priority: medium)
|
||||
2. **Testing**: Verify all test cases pass for both features
|
||||
3. **Documentation**: Update README.md and `.claude/CLAUDE.md` with feature descriptions
|
||||
4. **Deployment**: Build new Docker image with both features
|
||||
5. **User Communication**: Notify users of new self-service capabilities
|
||||
6. **Monitoring**: Track usage and error rates for both features during first week
|
||||
7. **Iteration**: Gather user feedback and implement improvements
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1: Core Features
|
||||
1. Reset Home Volume API handler and basic UI
|
||||
2. Restart Server API handler and basic UI
|
||||
3. Both confirmation modals
|
||||
|
||||
### Phase 2: Enhanced UX
|
||||
1. Status polling for restart operation
|
||||
2. Better error messages and user feedback
|
||||
3. Loading states and progress indicators
|
||||
|
||||
### Phase 3: Polish
|
||||
1. Audit logging for both operations
|
||||
2. Rate limiting implementation
|
||||
3. Edge case handling
|
||||
4. Accessibility improvements
|
||||
|
||||
## Summary
|
||||
|
||||
This feature plan adds two essential self-service capabilities to JupyterHub:
|
||||
|
||||
**Reset Home Volume** allows users to cleanly start over by removing their home directory volume when their server is stopped. This is useful for resolving corrupted environments or starting fresh with a clean slate. The operation uses Docker API to safely remove the `jupyterlab-{username}_home` volume after confirming the server is stopped.
|
||||
|
||||
**Restart Server** provides a convenient one-click solution to restart a running JupyterLab container using Docker's native restart functionality. Unlike JupyterHub's stop/spawn cycle (which recreates containers), this uses `container.restart()` to preserve the container identity, volumes, and configuration. This helps users quickly recover from server issues or apply certain configuration changes without losing their environment setup.
|
||||
|
||||
Both features maintain security through permission validation, provide clear user feedback through confirmation modals, and integrate seamlessly into the existing JupyterHub user control panel.
|
||||
41
Makefile
41
Makefile
@@ -4,14 +4,34 @@
|
||||
# GLOBALS #
|
||||
#################################################################################
|
||||
.DEFAULT_GOAL := help
|
||||
.PHONY: help build push start clean
|
||||
.PHONY: help build push start clean increment_version tag
|
||||
|
||||
# Include project configuration
|
||||
include project.env
|
||||
|
||||
# Use VERSION from project.env as TAG (strip quotes)
|
||||
TAG := $(subst ",,$(VERSION))
|
||||
|
||||
#################################################################################
|
||||
# COMMANDS #
|
||||
#################################################################################
|
||||
|
||||
## increment patch version in project.env
|
||||
increment_version:
|
||||
@echo "Incrementing patch version..."
|
||||
@awk -F= '/^VERSION=/ { \
|
||||
gsub(/"/, "", $$2); \
|
||||
match($$2, /^([0-9]+\.[0-9]+\.)([0-9]+)(_.*$$)/, parts); \
|
||||
new_patch = parts[2] + 1; \
|
||||
new_version = parts[1] new_patch parts[3]; \
|
||||
print "VERSION=\"" new_version "\""; \
|
||||
print "Version updated: " $$2 " -> " new_version > "/dev/stderr"; \
|
||||
next; \
|
||||
} \
|
||||
{ print }' project.env > project.env.tmp && mv project.env.tmp project.env
|
||||
|
||||
## build docker containers
|
||||
build:
|
||||
build: increment_version
|
||||
@cd ./scripts && ./build.sh
|
||||
|
||||
## build docker containers and output logs
|
||||
@@ -23,12 +43,23 @@ pull:
|
||||
docker pull stellars/stellars-jupyterhub-ds:latest
|
||||
|
||||
## push docker containers to repo
|
||||
push:
|
||||
push: tag
|
||||
docker push stellars/stellars-jupyterhub-ds:latest
|
||||
docker push stellars/stellars-jupyterhub-ds:$(TAG)
|
||||
|
||||
## start jupyterlab (fg)
|
||||
tag:
|
||||
@if git tag -l | grep -q "^$(TAG)$$"; then \
|
||||
echo "Git tag $(TAG) already exists, skipping tagging"; \
|
||||
else \
|
||||
echo "Creating git tag: $(TAG)"; \
|
||||
git tag $(TAG); \
|
||||
echo "Creating docker tag: $(TAG)"; \
|
||||
docker tag stellars/stellars-jupyterhub-ds:latest stellars/stellars-jupyterhub-ds:$(TAG); \
|
||||
fi
|
||||
|
||||
## start jupyterhub (fg)
|
||||
start:
|
||||
@cd ./bin && ./start.sh
|
||||
@./start.sh
|
||||
|
||||
## clean orphaned containers
|
||||
clean:
|
||||
|
||||
13
project.env
Normal file
13
project.env
Normal file
@@ -0,0 +1,13 @@
|
||||
# Project Configuration
|
||||
PROJECT_NAME="stellars-jupyterhub-ds"
|
||||
PROJECT_DESCRIPTION="Multi-user JupyterHub 4 deployment platform with data science stack, GPU auto-detection, NativeAuthenticator, and isolated per-user environments spawned via DockerSpawner"
|
||||
|
||||
# Version
|
||||
VERSION="2.11.35_cuda-12.9.1_jh-5.4.2"
|
||||
VERSION_COMMENT="Jupyterhub with GPU auto-detection, NativeAuthenticator, and DockerSpawner configuration and new build system"
|
||||
|
||||
# Author
|
||||
AUTHOR_NAME="Konrad Jelen"
|
||||
AUTHOR_ALIAS="Stellars Henson"
|
||||
AUTHOR_EMAIL="konrad.jelen+github@gmail.com"
|
||||
AUTHOR_LINKEDIN="https://www.linkedin.com/in/konradjelen/"
|
||||
Reference in New Issue
Block a user