mirror of https://github.com/stellarshenson/stellars-jupyterhub-ds.git synced 2026-03-08 06:00:29 +00:00

Files

stellarshenson 4b2fc084bf fix: use separate SQLite database for activity monitor

- ActivityMonitor now uses /data/activity_samples.sqlite instead of
  JupyterHub's main database to avoid SQLite locking conflicts
- Fixes "database is locked" errors that prevented login when both
  JupyterHub and ActivityMonitor wrote simultaneously
- Added "Last Active" column to activity table showing relative time

2026-01-20 19:04:39 +01:00

45 KiB

Raw Blame History

Claude Code Journal

This journal tracks substantive work on documents, diagrams, and documentation content.

Task - Add Docker badges: added Docker pulls and GitHub stars badges to README.md
Result: README now displays Docker pulls badge (stellars/stellars-jupyterhub-ds), Docker image size badge, and GitHub stars badge
Task - Project initialization and documentation: Analyzed codebase and created comprehensive project documentation
Result: Created .claude/CLAUDE.md with detailed architecture overview, configuration patterns, common commands, GPU auto-detection logic, volume management, authentication setup, and troubleshooting guide for future Claude Code instances
Task - Feature planning for user controls: Designed two self-service features for JupyterHub user control panel
Result: Created FEATURE_PLAN.md documenting Reset Home Volume and Restart Server features with implementation details, API handlers, UI templates, JavaScript integration, security considerations, edge cases, testing plans, and rollout strategy
Task - Version management implementation: Added version tracking and tagging system matching stellars-jupyterlab-ds pattern
Result: Created project.env with project metadata and version 1.0.0_jh-4.x, updated Makefile with increment_version and tag targets, auto-increment on build, dual-tag push (latest and versioned), leveraging existing Docker socket access for both planned features
Task - Implement user self-service features: Implemented Reset Home Volume and Restart Server features from FEATURE_PLAN.md
Result: Created custom API handlers in services/jupyterhub/conf/bin/custom_handlers.py with ResetHomeVolumeHandler and RestartServerHandler classes, created custom home.html template with buttons and confirmation modals, registered handlers in jupyterhub_config.py with @admin_or_self permissions, updated Dockerfile to copy templates and handlers, added feature documentation to .claude/CLAUDE.md - both features use Docker API directly via /var/run/docker.sock for volume management and container restart operations
Task - Enhance and fix self-service features: Evolved volume management from single home volume to multi-volume selection, fixed Bootstrap 5 compatibility, added visual enhancements
Result: Transformed ResetHomeVolumeHandler into ManageVolumesHandler supporting selective reset of home/workspace/cache volumes via checkboxes in UI, fixed template inheritance to properly extend JupyterHub's default home.html (resolving 404 errors), updated to Bootstrap 5 modal API (data-bs-toggle, data-bs-target, btn-close), wrapped JavaScript in RequireJS callback for proper module loading, added Font Awesome icons (fa-rotate for restart, fa-database for volumes), implemented automatic page refresh after Stop Server/Manage Volumes/Restart Server actions, updated API endpoint to /api/users/{username}/manage-volumes accepting JSON body with volume array, backend now processes multiple volumes and returns detailed success/failure response, bumped version to 3.0.12 reflecting major feature enhancement
Task - Document self-service features in README: Updated README with features section and screenshots demonstrating new self-service capabilities
Result: Added comprehensive Features section with bullet points covering GPU auto-detection, user self-service, isolated environments, native authentication, shared storage, and production-ready setup, created Self-Service Volume Management subsection with three screenshots (restart server button, manage volumes button, volume selection modal) and one-sentence descriptions for each, positioned visual documentation prominently after feature list to demonstrate user-facing functionality
Task - Production readiness and CI/CD setup: Implemented visual enhancements, GitHub Actions workflow, architecture documentation, and resolved critical production issues
Result: Added Font Awesome icons to all control buttons (fa-stop, fa-play, fa-rotate, fa-database), implemented MutationObserver for auto page refresh after server stop with icon re-injection before refresh, created GitHub Actions CI/CD workflow for Dockerfile validation with hadolint, pinned JupyterHub base image to version 5.4.2 (resolved DL3007 warning), reorganized README with mermaid architecture diagram showing Traefik -> Hub -> Spawner -> User containers flow with transparent background for dark mode compatibility, removed Claude co-authoring from entire git history (95 commits rewritten), fixed critical ModuleNotFoundError by adding /srv/jupyterhub to sys.path in config, built jupyterhub_config.py into Docker image by default (changed build context to project root, image now self-contained and works out-of-box), added pull_policy: build to prevent Docker Compose from pulling image after local build, created release tags STABLE_3.0.23 and RELEASE_3.0.23, version progression 3.0.20 -> 3.0.23 reflecting stability and production readiness improvements
Task - Privileged user docker.sock access: Implemented group-based access control for Docker socket mounting in user containers
Result: Created built-in protected group system with docker-privileged as single source of truth in config/jupyterhub_config.py::BUILTIN_GROUPS, implemented pre_spawn_hook to check group membership and conditionally mount /var/run/docker.sock with rw permissions, created startup script 02_ensure_groups.py that reads BUILTIN_GROUPS from config and creates missing groups before JupyterHub starts (follows DRY principle), added runtime protection in pre_spawn_hook to recreate groups if deleted, group cannot be permanently removed - auto-recreates on startup and before every container spawn, updated documentation in README and CLAUDE.md with security warnings about root-level host access, added instructions for admins to manage group membership through JupyterHub admin panel at /hub/admin, created four slash commands for common workflows: /build (verbose image build), /start (background compose up with log monitoring), /stop (make clean with orphan removal), /publish (journal update, rich commit, git push), bumped version to 3.1.2 reflecting new privileged access control feature
Task - Notification broadcast system: Implemented admin-only notification broadcast to all active JupyterLab servers with jupyterlab_notifications_extension integration
Result: Created BroadcastNotificationHandler and NotificationsPageHandler in custom_handlers.py for admin-only access at /hub/notifications, implemented concurrent notification delivery using asyncio.gather() with 5-second timeout per server, added temporary API token generation (5-minute expiry) for server authentication, created notifications.html template with message input (140 char limit), type selector (default/info/success/warning/error/in-progress), auto-close toggle, and live character counter, dynamically constructs endpoint URLs using spawner.server.base_url for correct routing, sends notifications to /jupyterlab-notifications-extension/ingest with proper payload format (type field not variant, actions array with Dismiss button), implemented comprehensive error handling for connection timeouts, auth failures, missing extensions with user-friendly messages, added one-line logging per server showing username, message preview, notification type, and outcome (SUCCESS/FAILED/ERROR), registered handlers in jupyterhub_config.py extra_handlers, updated CLAUDE.md and README.md with detailed feature documentation, added navigation link on home page for admin access, resolved Docker build cache issues requiring --no-cache flag to properly include new handlers, bumped version to 3.2.0 reflecting major notification broadcast feature
Task - Documentation and screenshots: Created minimal documentation for notification system, UI customization, and Docker socket permissions following super-minimal modus primaris style
Result: Created doc/notifications.md (35 lines - key technical facts, handler implementation, dependencies, error handling), doc/ui-template-customization.md (55 lines - JavaScript patterns, Bootstrap 5 syntax, CSRF protection, build process), doc/docker-socket-permissions.md (66 lines - pre-spawn hook implementation, built-in group system, security implications), updated README.md User Interface section replacing screenshot-restart-server.png with screenshot-home.png showing complete user control panel and adding screenshot-send-notification.png displaying admin notification broadcast interface, refreshed .claude/CLAUDE.md with accurate notification system technical details (140 char limit, all 6 notification types, temporary token generation with 5-minute expiry, correct endpoint URL pattern with base_url interpolation, hyphenated extension path, Dismiss button feature, HTTP 500 error handling, one-line logging details), all documentation simplified to "glimpse of implementation" with essential bullet points and code snippets absent of lengthy narrative
Task - Configuration-agnostic volume management with optional descriptions: Refactored volume management to dynamically read from configuration and added optional volume descriptions
Result: Modified jupyterhub_config.py to define DOCKER_SPAWNER_VOLUMES as module-level constant, created get_user_volume_suffixes() function extracting volume suffixes matching jupyterlab-{username}_ pattern, added USER_VOLUME_SUFFIXES calculated from DOCKER_SPAWNER_VOLUMES (importable by handlers), protected all c.* assignments with 'if c is not None:' guards to allow module import without NameError, added optional VOLUME_DESCRIPTIONS dict mapping volume suffixes to user-friendly descriptions, exposed volume_descriptions via c.JupyterHub.template_vars, updated home.html template to use Jinja2 loop generating checkboxes dynamically from user_volume_suffixes with conditional description display, modified ManageVolumesHandler to validate against USER_VOLUME_SUFFIXES from config instead of hardcoded set, simplified README.md following modus primaris (features first, simplified notification description, simplified architecture diagram labels), updated documentation in .claude/CLAUDE.md (Manage Volumes section with configuration example, simplified Restart Server section) and doc/ui-template-customization.md (added template variables section), UI now fully agnostic working correctly if volumes renamed/added/removed in configuration without template or handler changes
Task - Release v3.2.11 preparation and documentation refinement: Tagged release, prepared delta notes, simplified documentation, corrected security warnings
Result: Created git tag RELEASE_3.2.11 with annotations covering configuration-agnostic volume management, notification broadcast, privileged access control, created RELEASE.md delta release notes documenting major features (configuration-agnostic volume management with optional descriptions, admin notification broadcast with 6 types, privileged user docker.sock access control), technical improvements (module-level constants, dynamic handlers/templates), documentation updates (modus primaris style, new screenshots), version history (v3.2.11 ← v3.2.0 ← v3.1.2 ← v3.0.23 ← v3.0.14), upgrade notes with backward compatibility, simplified doc/docker-socket-permissions.md from 66 to 19 lines removing verbose explanations/use cases/auditing commands, updated project.env with RELEASE_TAG and RELEASE_DATE metadata, corrected docker.sock security warnings changing "host system" to "Docker host" (docker.sock grants Docker daemon access not physical host), applied HTML alert styling (alert-block alert-warning) to security warnings in README.md for better visibility, updated security warnings in README.md (Requirements and Privileged Access sections), doc/docker-socket-permissions.md, and .claude/CLAUDE.md
Task - Configuration flow and UI workflow diagrams: created mermaid diagrams illustrating settings interaction and user self-service workflows
Result: added Configuration Flow section to README after Architecture section showing how environment variables in compose.yml (ADMIN, NOTEBOOK_IMAGE, NETWORK, GPU, SSL, MLFLOW, GLANCES, TENSORBOARD) flow through jupyterhub_config.py (AUTH, SPAWN, VOLS, HOOK) to spawned user containers (LAB, MLFLOW, GLANCES, TENSORBOARD, GPUACCESS), three-layer graph with amber/green/blue color scheme indicating configuration source, processing layer, and runtime services, removed fill colors from shapes (stroke-only styling), restructured ENABLE_SERVICE_* as horizontal subgraph showing MLflow/Glances/TensorBoard as examples passed to Lab env, expanded CONFIG section to include DOCKER_SPAWNER_VOLUMES, VOLUME_DESCRIPTIONS, BUILTIN_GROUPS, pre_spawn_hook, extra_handlers (ManageVolumes/RestartServer/Notifications), template_paths, DOCKER_NOTEBOOK_DIR, added JUPYTERHUB_BASE_URL, TF_CPP_MIN_LOG_LEVEL, NVIDIA_AUTODETECT_IMAGE to ENV section, consolidated RUNTIME section showing Services controlled by ENABLE_SERVICE_* env vars, created separate GPU Auto-Detection section with flowchart diagram showing decision logic (ENABLE_GPU_SUPPORT values 0/1/2), auto-detect mechanism spawning temporary nvidia/cuda container running nvidia-smi, success/failure paths setting NVIDIA_DETECTED flag, cleanup of test container jupyterhub_nvidia_autodetect, and final application of device_requests to spawned containers, created User Self-Service Workflow diagram showing home page UI states (server running vs stopped), custom buttons (Restart Server with fa-rotate icon, Manage Volumes with fa-database icon), modal interface with checkboxes and optional descriptions, API handlers (RestartServerHandler, ManageVolumesHandler) with @admin_or_self permissions, Docker operations (container.restart, spawner.stop, volume.remove, spawner.start), AJAX requests (POST for restart, DELETE for volume management with JSON payload), auto page refresh via MutationObserver after successful operations, created Volume Architecture diagram showing multi-user deployment with two example users (user1, user2) each having three dedicated volumes (home, workspace, cache) mounted at /home, /home/lab/workspace, /home/lab/.cache respectively, single jupyterhub_shared volume represented as shared resource mounted at /mnt/shared accessible from all user containers, improved arrow alignment by repositioning MSHARED node between HOST and CONTAINER subgraphs with solid arrow from VOLSHARED emphasizing direct mount relationship and dashed arrows to containers showing accessibility, added note explaining users can selectively reset personal volumes (home/workspace/cache) through Manage Volumes when server stopped but shared volume protected from individual user reset, converted HTML alert divs to GitHub-style WARNING blocks (two instances in README)
Task - Update watchtower image: Replaced deprecated watchtower image with maintained fork
Result: Changed watchtower image from containrrr/watchtower:latest to nickfedor/watchtower:latest in compose.yml, new image is actively maintained and compatible with latest Docker versions, bumped version to 3.3.1
Task - Cleanup startup scripts: Removed obsolete nvidia-smi script and renumbered ensure_groups
Result: Deleted 01_nvidia-smi.sh (GPU detection now uses separate nvidia/cuda container spawned by jupyterhub_config.py), renamed 02_ensure_groups.py to 01_ensure_groups.py for sequential ordering, bumped version to 3.3.2
Task - Fix Watchtower refresh frequency: Investigated and fixed Watchtower scheduling issues
Result: Removed unsupported --no-startup flag (caused crash), fixed cron expression from 5-field 0 0 * * * to 6-field 0 0 0 * * * (watchtower uses seconds) - was running hourly instead of daily at midnight UTC
Task - Refactor Docker access control groups: Split into two groups with distinct purposes
Result: Renamed docker-privileged to docker-sock (mounts docker.sock), created new docker-privileged (runs with --privileged flag). Updated pre_spawn_hook to check both groups and set spawner.volumes or spawner.privileged accordingly. Updated README.md, doc/docker-socket-permissions.md, .claude/CLAUDE.md with new group documentation
Task - Fix privileged container mode: DockerSpawner privileged flag not working
Result: Changed from spawner.privileged = True to spawner.extra_host_config['privileged'] = True - DockerSpawner requires extra_host_config dict for host configuration options. Bumped version to 3.4.1
Task - Exclude watchtower from self-updates: Prevent watchtower from updating itself
Result: Added com.centurylinklabs.watchtower.enable=false label to watchtower service in compose.yml - legacy namespace kept by forks for backward compatibility
Task - Traefik host-based routing template: Created deployment template for local Traefik with self-signed certificates
Result: Added extra/traefik-host-based-routing/ template for creating <name>_stellars_jupyterhub_ds deployments with local Traefik reverse proxy (ports 80/443), self-signed wildcard certificates, and HTTP->HTTPS redirect. Includes compose_override.yml with YOURDOMAIN placeholder, Makefile (start/stop/pull/logs/status), start.sh (clone/pull + start), stop.sh, generate-certs.sh (creates wildcard cert and tls.yml for given domain), .gitignore (excludes certs and cloned repo). Updated extra/README.md with template listing
Task - Fix root path base URL redirect: Fixed double-slash issue when JUPYTERHUB_BASE_URL=/
Result: Added JUPYTERHUB_BASE_URL_PREFIX normalization in config/jupyterhub_config.py. When JUPYTERHUB_BASE_URL is /, '', or None, prefix becomes empty string. Updated three URL concatenations (default_url, hub_connect_url, base_url) to use prefix instead of raw BASE_URL. Prevents browser interpreting //hub/home as protocol-relative URL pointing to host hub
Task - Update traefik-host-based-routing template: Refactored deployment template for root path routing
Result: Removed Makefile from extra/traefik-host-based-routing/. Updated compose_override.yml with JUPYTERHUB_BASE_URL=/ and root path Traefik routing. Updated start.sh to pull images and use --no-build. Updated README to reflect simplified workflow without Makefile
Task - User rename with authorization preservation: Implemented admin API to rename users while preserving NativeAuthenticator authorization
Result: Created RenameUserHandler in custom_handlers.py with PATCH /hub/api/users/{username}/rename endpoint. Handler updates both JupyterHub User and NativeAuthenticator UserInfo tables atomically, preserving is_authorized status. Requires admin privileges and stopped server. Includes validation (name uniqueness, server state), rollback on failure, and detailed logging. Registered handler in jupyterhub_config.py. Added Admin Features section to .claude/CLAUDE.md documenting the problem (UserInfo not synced on rename), solution (atomic update of both tables), and API contract
Task - Fix NativeAuthenticator sync on rename: Replaced broken handler approach with SQLAlchemy event listener
Result: Discovered extra_handlers appends to default handlers (doesn't override), so SyncedUserAPIHandler was never called. Implemented SQLAlchemy @event.listens_for(orm.User.name, 'set') in jupyterhub_config.py to intercept ALL User.name changes at ORM level. Removed SyncedUserAPIHandler and RenameUserHandler classes from custom_handlers.py. Removed corresponding routes from extra_handlers. Updated .claude/CLAUDE.md replacing "Admin Features > Rename User" with simpler "NativeAuthenticator Sync" note. New approach catches renames from admin panel, API, and any other source automatically
Task - Version display in browser console: Added version badge to browser JavaScript console
Result: Added ARG VERSION=dev and ENV STELLARS_JUPYTERHUB_VERSION=${VERSION} to Dockerfile. Updated compose.yml build args to pass VERSION=${VERSION:-dev}. Updated scripts/build.sh and scripts/build_verbose.sh to source project.env and export VERSION. Added stellars_version template variable in jupyterhub_config.py. Added styled console.log in home.html displaying blue "Stellars JupyterHub DS" + green version badge in browser console
Task - Admin user creation with credentials modal: Implemented auto-generation of passwords when admin creates users via admin panel
Result: Added after_insert SQLAlchemy event listener in jupyterhub_config.py to auto-create NativeAuthenticator UserInfo with memorable 3-word password (e.g., storm-apple-ocean) and is_authorized=1 (auto-approved). Added after_delete listener to clean up UserInfo when user deleted. Created password cache in custom_handlers.py with 5-minute expiry for secure credential handoff. Added GetUserCredentialsHandler API at /api/users/credentials for admin retrieval. Created templates_enhanced/ directory with customized templates - enhanced admin.html with fetch interceptor to detect user creation, credentials modal with copy/download functionality, enhanced page.html with NativeAuth nav items (Change Password, Authorize Users). Fixed Dockerfile to copy from templates_enhanced/ instead of templates/. Fixed fetch interceptor to handle URL objects (React passes URL object, not string). Changed version console.log to plain text format
Task - Fix admin template URL handling: Fixed fetch interceptor and nav links in templates_enhanced
Result: Fixed isUserCreation check in admin.html to strip query params (?_xsrf=...) before checking if URL ends with api/users - React admin adds XSRF token as query param which broke the endpoint detection. Fixed double "hub" prefix in page.html nav links - changed {{ base_url }}hub/authorize to {{ base_url }}authorize and {{ base_url }}hub/change-password to {{ base_url }}change-password since base_url already includes /hub/
Task - Fix credentials API route conflict: Changed credentials endpoint from /api/users/credentials to /api/admin/credentials
Result: JupyterHub's built-in /api/users/* handler was catching requests before custom handler, returning "Invalid JSON keys" error. Changed route in jupyterhub_config.py, admin.html, and custom_handlers.py docstring
Task - Credentials modal UX improvements: Enhanced modal layout, scrolling, and loading feedback
Result: Moved Copy/Download buttons to top of modal body. Added scrollable container (max-height 300px) for long user lists. Removed <code> styling for plain text display. Added loading spinner modal shown between user creation and credentials display. Improved Makefile version output format to show "Current version" and "New version" lines
Task - Fix password storage and enable bake: Fixed NativeAuthenticator password storage format and Docker build configuration
Result: Changed password storage from decoded string to bytes - bcrypt.hashpw() returns bytes which NativeAuthenticator ORM expects as LargeBinary. Previous string storage caused "argument 'salt': 'str' object cannot be converted to 'PyBytes'" error on login. Enabled COMPOSE_BAKE=true in build.sh, build_verbose.sh, and .claude/commands/build.md to use Docker's bake builder (removes deprecation warning). Tagged STABLE_3.5.10
Task - Per-row copy icon in credentials modal: Added individual copy functionality for each credential row
Result: Added third column with subtle copy icon (40% opacity) for each row in credentials table. Click handler copies "Username: xxx\nPassword: yyy" to clipboard. Icon changes to checkmark briefly (1.5s) after successful copy, then returns to copy icon
Task - Comment out console logs: Removed debug console output from admin template
Result: Commented out all console.log and console.error statements in admin.html (11 total) - fetch interceptor logs, user creation detection, credentials fetch/receive, error handlers. Version display in home.html remains active as only console output
Task - Comprehensive UI styling enhancements: Extended custom.css with consistent styling across all JupyterHub pages
Result: Added notifications page styling (form container, textarea, character counter, results section with banded table, status badges), removed obtrusive hover highlight on admin user rows (replaced with subtle 0.003 alpha effect), set collapsed user card padding to 0 to override inline styles, unified button font sizes to 0.8rem across all pages (authorization buttons, token revoke, admin actions, collapse buttons, toggle details), added Add Users form panel styling (labels, textarea, submit button, dark mode support), disabled hover on expanded card tables via box-shadow override, added groups page styling (list items, card footer, badges), styled collapse/expand buttons (0.2rem 0.3rem padding), added username label spacing, dark mode Add Users button with explicit colors (#f8f9fa bg, #6c757d border)
Task - Traefik template start.sh --refresh flag: Added optional refresh flag to start.sh for pulling latest upstream
Result: Modified extra/traefik-host-based-routing/start.sh to clone repo only once on first run, skip git operations if repo exists (just uses existing), added --refresh flag to pull latest from origin/main when explicitly requested
Task - Volume renamer script: Created extra/volume-renamer with script for renaming Docker volumes between users
Result: Created rename-user-volumes.sh - renames all jupyterlab-* volumes from source pattern to target username, handles Docker's dot-to-hex encoding (. becomes -2e), supports --dry-run (show mappings without changes) and --keep-orig (preserve source volumes), compact help with generic examples (oldnick -> first.last)
Task - UI refinements and icon cleanup: Admin panel styling and password toggle icon improvements
Result: Aligned admin action buttons to right (td.actions text-align), added transparent background for dark mode .server-dashboard-container, made table hover nearly invisible (--bs-table-hover-bg: rgba 0.003), replaced eye emoji with Font Awesome icons (fa-eye/fa-eye-slash) in all 4 password templates (signup, login, change-password, change-password-admin) - both button HTML and JavaScript toggle
Task - Table hover and README update: Fixed Bootstrap table-hover styling and documented mnemonic passwords feature
Result: Added --bs-table-accent-bg override for Bootstrap 5 table-hover (two-step variable process), reduced Shutdown Hub button margin from 30px to 5px, removed color override from dark mode .server-dashboard-container, updated README Features with admin user creation and mnemonic passwords (e.g., storm-apple-ocean)
Task - Authorization page improvements: Protected users with accounts from accidental discard
Result: Added JavaScript to authorization-area.html that fetches /api/users and hides Discard buttons for usernames that exist in JupyterHub (initial custom handler approach failed - NativeAuthenticator handlers take precedence over extra_handlers), removed unused AuthorizationAreaHandler from config and custom_handlers.py, made authorization table compact with CSS for .authorization-container, fixed 403 error by adding X-XSRFToken header and credentials to fetch request
Task - Fix table hover color override: Admin panel table hover color not being overridden despite CSS
Result: Discovered Bootstrap 5 table-hover targets td/th cells specifically while our CSS used universal > * selector. Fixed by changing selectors to explicitly target > td, > th and using transparent instead of CSS variables. Updated three rule sets: table-hover override (lines 1055-1061), user-row hover (lines 1086-1092), and dark mode user-row hover (lines 1364-1370)
Task - Server-side authorization discard fix: Replaced clunky JavaScript API call with server-side Jinja2 logic
Result: Created StellarsNativeAuthenticator subclass that overrides get_handlers() to inject CustomAuthorizationAreaHandler, which passes hub_usernames set to template. Updated authorization-area.html to use {% if user.username not in hub_usernames %} instead of JavaScript fetch. Removes API call, XSRF token handling, and flash of Discard buttons before hiding
Task - Custom logo support: Added JUPYTERHUB_LOGO_FILE configuration for custom branding
Result: Added logo_file config that checks for file at /srv/jupyterhub/logo.svg (or JUPYTERHUB_LOGO_FILE env var), JupyterHub serves it at {{ base_url }}logo automatically. Fixed CustomAuthorizationAreaHandler 403 by adding @needs_scope('admin:users') decorator and importing orm inside get() method. Simplified page.html logo block to always use base_url/logo
Task - Enhance certs-installer scripts: Added optional folder parameter and help option
Result: Updated install_cert.sh and install_cert.bat to accept optional directory argument (default: current directory), added -h/--help/? flags showing usage, supported file types, and examples. Both scripts now display which directory is being scanned
Task - Fix volume reset encoding: Fixed Docker volume/container name encoding for special characters
Result: Added encode_username_for_docker() function using escapism library (same as DockerSpawner) to ensure compatibility with JupyterHub's naming scheme. Updated ManageVolumesHandler (line 152), RestartServerHandler (line 227), and BroadcastNotificationHandler (line 454) to use encoded usernames. Handles special characters like . -> -2e, @ -> -40 matching JupyterHub's default encoding
Task - Selective notification recipients: Enhanced notification broadcast to allow targeting specific servers
Result: Added ActiveServersHandler (GET /api/notifications/active-servers) to list active servers. Modified BroadcastNotificationHandler to accept optional recipients array - filters to selected users if provided, sends to all if omitted (backward compatible). Updated notifications.html with "Send to all active servers" checkbox, server selection list with Select All/Deselect All buttons, dynamic button text showing recipient count. Validation prevents sending with no recipients selected
Task - Idle server culler: Implemented automatic shutdown of inactive servers
Result: Added jupyterhub-idle-culler package to Dockerfile. Added configuration with environment variables: IDLE_CULLER_ENABLED (default 0), IDLE_CULLER_TIMEOUT (default 86400s/24h), IDLE_CULLER_CULL_EVERY (default 600s/10min), IDLE_CULLER_MAX_AGE (default 0/unlimited). Service runs as managed JupyterHub service with role-based scopes. Disabled by default, opt-in via IDLE_CULLER_ENABLED=1. Bumped version to 3.6.0
Task - Standardize env vars with JUPYTERHUB_ prefix: Renamed all environment variables to use consistent JUPYTERHUB_ prefix and added admin Settings page
Result: Renamed 13 environment variables (ENABLE_GPU_SUPPORT->JUPYTERHUB_GPU_ENABLED, ENABLE_JUPYTERHUB_SSL->JUPYTERHUB_SSL_ENABLED, ENABLE_SERVICE_->JUPYTERHUB_SERVICE_, ENABLE_SIGNUP->JUPYTERHUB_SIGNUP_ENABLED, DOCKER_NOTEBOOK_IMAGE->JUPYTERHUB_NOTEBOOK_IMAGE, DOCKER_NETWORK_NAME->JUPYTERHUB_NETWORK_NAME, NVIDIA_AUTODETECT_IMAGE->JUPYTERHUB_NVIDIA_IMAGE, IDLE_CULLER_->JUPYTERHUB_IDLE_CULLER_). Created SettingsPageHandler with admin-only access at /settings showing read-only banded table of all configuration values. Updated compose.yml, jupyterhub_config.py, custom_handlers.py, README.md, CLAUDE.md. Added Settings link to admin navbar. No backward compatibility for old names. Bumped version to 3.6.1
Task - Settings dictionary YAML: Externalized settings metadata to config/settings_dictionary.yml
Result: Created settings_dictionary.yml with categories as top-level keys (JupyterHub Core, Docker Spawner, GPU, Services, Idle Culler, Branding), each containing list of settings with name, description, default, and optional empty_display. Updated SettingsPageHandler to load from YAML instead of hardcoded values. Added pyyaml to Dockerfile pip install. Dockerfile now copies settings_dictionary.yml to /srv/jupyterhub/
Task - xkcdpass password generation: Replaced custom word list with xkcdpass library for auto-generated passwords
Result: Moved settings_dictionary.yml to services/jupyterhub/conf/ for proper image baking. Replaced hardcoded word list with xkcdpass library for memorable password generation. Added configurable env vars: JUPYTERHUB_AUTOGENERATED_PASSWORD_WORDS (default 4) and JUPYTERHUB_AUTOGENERATED_PASSWORD_DELIMITER (default "-"). Added xkcdpass to Dockerfile pip install. Fixed ENABLE_SIGNUP to JUPYTERHUB_SIGNUP_ENABLED in Dockerfile defaults
Task - Cleanup templates directory structure: Removed redundant HTML files and renamed directory for clarity
Result: Removed unused *.html files from services/jupyterhub/templates/ (only certs/ subdirectory needed). Renamed templates_enhanced to html_templates_enhanced for clearer naming. Updated Dockerfile COPY paths accordingly
Task - Version footer on home page: Added version status bar to home page matching admin page style
Result: Added {% block footer %} to home.html displaying "Stellars JupyterHub DS X.Y.Z | JupyterHub X.Y.Z". Uses stellars_version.split('_')[0] to show only major.minor.patch (strips _jh-x.x suffix). Added <div class="mt-5 pt-5"></div> spacer before footer for visual separation from content. Uses server_version template variable for JupyterHub version
Task - Fix server_version not populated: Investigated and fixed missing JupyterHub version in home page footer
Result: Discovered AdminHandler explicitly passes server_version to admin.html but HomeHandler does not pass it to home.html - it's handler-specific, not global. Added jupyterhub.__version__ to c.JupyterHub.template_vars in jupyterhub_config.py making server_version available globally to all templates
Task - Create stop.sh script: Added stop.sh to complement start.sh for platform shutdown
Result: Created stop.sh mirroring start.sh pattern - resolves script location via readlink/dirname, respects compose_override.yml if present, runs docker compose down --remove-orphans
Task - Enhance traefik-host-based-routing template: Major improvements to deployment template with CIFS support and certificate installers
Result: Added optional CIFS mount support via compose_cifs.yml and .env.example (ENABLE_CIFS=1), created install_cert.sh for Linux (multi-distro: Debian/Ubuntu, RHEL/CentOS, Arch, Alpine, macOS) and enhanced install_cert.bat for Windows with folder argument and help flags, fixed compose_override.yml stray quote and added JUPYTERHUB_IDLE_CULLER_ENABLED/JUPYTERHUB_SIGNUP_ENABLED defaults, enhanced generate-certs.sh with generic CN for browser compatibility and verification output, updated start.sh/stop.sh to load .env and conditionally include compose_cifs.yml, updated README with CIFS instructions and certificate installation commands, added .env to .gitignore
Task - Rename GLANCES to RESOURCES_MONITOR: Renamed environment variable for clarity
Result: Renamed JUPYTERHUB_SERVICE_GLANCES to JUPYTERHUB_SERVICE_RESOURCES_MONITOR across compose.yml, jupyterhub_config.py, settings_dictionary.yml, README.md, and CLAUDE.md
Task - Idle culler session extension feature: Implemented user session extension capability for idle culler
Result: Added JUPYTERHUB_IDLE_CULLER_MAX_EXTENSION env var (default 24 hours) to jupyterhub_config.py, created SessionInfoHandler (GET /api/users/{username}/session-info) and ExtendSessionHandler (POST /api/users/{username}/extend-session) in custom_handlers.py, added Session Status card to home.html with countdown timer, extension dropdown (1h/2h/4h/8h), and extension allowance display, extension tracking stored in spawner.orm_spawner.state (resets on server restart), timer updates locally every 60s with server refresh every 5 minutes, color-coded warnings (yellow < 1h, red < 30min), extension button disabled when limit reached, added all idle culler settings to compose.yml (ENABLED, TIMEOUT, INTERVAL, MAX_AGE, MAX_EXTENSION), added setting to settings_dictionary.yml, updated README.md Idle Server Culler section with MAX_EXTENSION documentation, passed idle_culler_enabled/timeout/max_extension to templates via template_vars
Task - Harmonize env settings across files: Ensured all env settings are consistent in compose.yml, Dockerfile, and config
Result: Added all missing ENV defaults to Dockerfile (ADMIN, BASE_URL, SSL_ENABLED, NOTEBOOK_IMAGE, NETWORK_NAME, GPU_ENABLED, NVIDIA_IMAGE, SERVICE_MLFLOW/RESOURCES_MONITOR/TENSORBOARD, all IDLE_CULLER settings, TF_CPP_MIN_LOG_LEVEL), added AUTOGENERATED_PASSWORD_WORDS/DELIMITER to compose.yml, standardized logo setting as JUPYTERHUB_LOGO_URI with file:// prefix (supports external http/https URIs), updated NVIDIA_IMAGE to nvidia/cuda:13.0.2-base-ubuntu24.04 across all files (compose.yml, jupyterhub_config.py, Dockerfile, settings_dictionary.yml, README.md, CLAUDE.md), config now strips file:// prefix for local files while allowing external URI support in templates
Task - Improve session extension UI: Changed extension input from dropdown to numeric and added explanatory note
Result: Replaced dropdown with numeric input (min=1, max set dynamically to available hours), renamed card title to "Idle Session Timeout", added explanatory note "Your server will be stopped after a period of inactivity to free up resources", added input validation for minimum 1 hour, button text shortened to "Extend"
Task - Session extension truncation and cumulative logic: Fixed extension behavior to be cumulative and truncate excess requests
Result: Changed extension logic from resetting timer to cumulative additions (extensions ADD to remaining time), fixed datetime timezone mismatch (use offset-naive datetime.utcnow() to match JupyterHub internal format), implemented truncation when requested hours exceed available (truncates to max available instead of rejecting), added truncated flag in API response with detailed message, UI shows warning alert (yellow) when truncated with 4-second display vs 2-second for normal success, fixed extend button re-enable after successful extension, added detailed logging showing base timeout, extension calculations, and remaining time, kept extend button enabled when max reached (backend handles rejection with error message)
Task - Fix activity tracking to use spawner not user: Changed session timer to use spawner.orm_spawner.last_activity instead of user.last_activity
Result: Discovered user.last_activity updates on any Hub page access (causing timer reset on refresh), while spawner.orm_spawner.last_activity only updates on actual JupyterLab activity - which matches what jupyterhub-idle-culler uses for culling decisions. Updated SessionInfoHandler and ExtendSessionHandler to use spawner.orm_spawner.last_activity (Server object has no last_activity attribute), added /checkpoint command for creating milestone tags before major changes
Task - Activity Monitor page implementation: Created admin-only activity monitoring page with database persistence, 3-state status indicator, proper activity scoring, and reset functionality
Result: Implemented comprehensive user activity monitoring system for JupyterHub administrators with multiple iterations to refine the design. Database Layer: ActivitySample SQLAlchemy model stored in JupyterHub's existing SQLite database with columns (id, username, timestamp, last_activity, active) and composite index on username+timestamp for efficient queries. ActivityMonitor Singleton: Central service class with methods - record_sample() stores activity snapshots marking user active if last_activity within INACTIVE_AFTER threshold, get_score() calculates weighted activity using exponential decay formula weight = exp(-lambda * age_hours) where lambda = ln(2) / half_life with score computed as ratio of weighted_active to weighted_total from measured samples only (unmeasured periods don't count against user), get_status() returns aggregate statistics, rename_user()/delete_user() handle lifecycle events, prune_old_samples() removes expired data, reset_all() clears all samples for admin reset functionality. Environment Variables: JUPYTERHUB_ACTIVITYMON_SAMPLE_INTERVAL (default 600s/10min - sampling frequency), JUPYTERHUB_ACTIVITYMON_RETENTION_DAYS (default 7 days), JUPYTERHUB_ACTIVITYMON_HALF_LIFE (default 24h - decay rate), JUPYTERHUB_ACTIVITYMON_INACTIVE_AFTER (default 60min - threshold for marking user inactive). Added to Dockerfile ENV defaults and settings_dictionary.yml for Settings page display. Backend Handlers: ActivityPageHandler renders activity.html template, ActivityDataHandler returns JSON with per-user data (username, server_active, recently_active, cpu_percent, memory_mb, memory_percent, time_remaining_seconds, activity_score, sample_count, last_activity) plus timestamp, sampling_status, and inactive_after_seconds, ActivityResetHandler (POST /api/activity/reset) clears all activity data with admin-only access. 3-State Status Indicator: Green (#28a745 explicit hex - not Bootstrap text-success which appeared teal) for server running AND recently_active (activity within INACTIVE_AFTER minutes), Yellow (text-warning) for server running BUT inactive (exceeded INACTIVE_AFTER threshold), Red (text-danger) for server offline. Tooltips updated to "Online (active)", "Online (inactive)", "Offline". Activity Bar: Score 0-100 mapped to 0-5 lit segments, colors based on level (green ≥4 segments, yellow ≥2, red <2), unlit segments rendered as transparent with gray border outline. Frontend Features: JavaScript handles 30-second auto-refresh via setInterval, manual Refresh button, Reset button (red with trash icon) with confirmation dialog that calls /api/activity/reset and refreshes data, loading spinner, empty state message, XSRF token handling. Column header shows "Activity (7 days)" matching retention period. Score Calculation Fix: Original implementation checked if last_activity changed between samples - flawed because JupyterLab reports activity infrequently so most samples showed "no change" = inactive. Fixed to check if last_activity is within INACTIVE_AFTER minutes of sample time. Score calculation changed from normalizing against theoretical maximum to using ratio of weighted active samples to weighted total measured samples - ensures unmeasured periods don't penalize users. User Lifecycle Integration: SQLAlchemy event listener in jupyterhub_config.py calls rename_activity_user() on User.name changes and delete_activity_user() on user deletion. Performance Optimization: Original implementation caused "Event loop was unresponsive for at least 5.x seconds" warnings because Docker stats API calls (container.stats(stream=False)) block for ~2 seconds per container. Fixed by adding ThreadPoolExecutor (_docker_executor with 4 workers, thread_name_prefix="docker-stats") and async wrapper get_container_stats_async() using loop.run_in_executor(). ActivityDataHandler collects active users first, fetches all Docker stats in parallel via asyncio.gather(stats_tasks, return_exceptions=True), reducing response time from O(n2s) to O(2s) regardless of user count while keeping Tornado event loop responsive. UI Cleanup: Removed status legend (redundant with tooltips), removed old background sampler thread code (_run_activity_sampler, _sample_all_users, start_activity_sampler, stop_activity_sampler functions) since sampling now happens on-demand when activity page is viewed. UI Polish: Refresh button shows spinner while loading data (disables button, replaces icon with spinner-border, re-enables on complete), timestamp display changed from absolute time "Measured: 10:30:00 AM" to relative format "Measured 5min ago" using formatTimeAgo() function (supports just now/Xmin ago/Xh ago/Xd ago)
Task - Activity Monitor UI polish and sample cleanup fix: Improved activity bar design and fixed sample retention enforcement
Result: Fixed inconsistent timestamp text (initial "Last updated: --" vs loaded "Measured X ago" - removed initial text so only "Measured X ago" shows). Redesigned activity bar from separate 16x16 boxes to thin continuous bar (80px wide, 8px tall) with subtle 1px dividers between 5 segments - visually distinct from day-based layouts. Fixed critical sample cleanup bug - cleanup only ran when new sample was INSERT-ed, not when existing sample was UPDATE-d, causing old samples to persist indefinitely when refreshes kept updating the last sample. Moved cleanup to run on every record_sample() call regardless of update vs insert, ensuring samples older than RETENTION_DAYS are always pruned
Task - Fix SQLite database locking: Resolved database lock errors preventing login
Result: ActivityMonitor was using JupyterHub's main SQLite database (/data/jupyterhub.sqlite) via separate SQLAlchemy connection, causing sqlite3.OperationalError: database is locked when both tried to write simultaneously (login + activity sampling). Fixed by moving ActivityMonitor to separate database file /data/activity_samples.sqlite, completely avoiding lock contention. Added "Last Active" column to activity table showing relative time since user's last activity (e.g., "5min", "2h 14min")

45 KiB Raw Blame History

Claude Code Journal

45 KiB

Raw Blame History