mirror of https://github.com/stellarshenson/stellars-jupyterhub-ds.git synced 2026-03-08 06:00:29 +00:00

Files

stellarshenson ad1d37b16f feat: add 24h minimum data requirement for activity score

- Activity tooltip shows "Not enough data (Nh of 24h collected)" until
  sufficient samples collected (144 samples at 10-min intervals)
- Progress bar still renders to show emerging trend, tooltip clarifies
  percentage not yet reliable
- Added activitymon_sample_interval to template_vars for frontend access
- Rewrote docs/activity-tracking-methodology.md as comprehensive
  implementation specification covering data collection, scoring formulas,
  UI components, API endpoints, and design rationale

2026-01-25 12:25:38 +01:00

8.7 KiB

Raw Blame History

JupyterHub Activity Monitoring Methodology

This document describes the activity monitoring system implemented in stellars-jupyterhub-ds.

Overview

The Activity Monitor provides administrators with real-time visibility into user engagement and resource consumption across all JupyterLab containers. It combines instantaneous metrics (CPU, memory, status) with historical activity scoring using exponential decay.

Access: Admin-only page at /activity

Data Collection

Activity Sampling

The system samples user activity at regular intervals, recording whether each user was active during that period.

Parameter	Environment Variable	Default	Description
Sample interval	`JUPYTERHUB_ACTIVITYMON_SAMPLE_INTERVAL`	600s (10 min)	Time between activity samples
Inactive threshold	`JUPYTERHUB_ACTIVITYMON_INACTIVE_AFTER`	60 min	Minutes since last activity to mark inactive
Retention period	`JUPYTERHUB_ACTIVITYMON_RETENTION_DAYS`	7 days	How long samples are stored

Each sample records:

Username
Timestamp
Active flag (true if last_activity within inactive threshold)

Samples are stored in SQLite database (/data/activity_samples.db) and persist across hub restarts.

Resource Metrics

Real-time metrics collected via Docker API:

Metric	Update Interval	Source
CPU %	10s	Docker stats API
Memory MB	10s	Docker stats API
Volume sizes	1h	Docker volume inspect
Server status	10s	JupyterHub spawner state

Volume breakdown shows per-volume sizes (home, workspace, cache) in tooltip.

Activity Score Calculation

Exponential Decay Formula

Activity score uses weighted ratio of active samples, where recent samples count more than older ones:

score = (Σ weighted_active) / (Σ weighted_total) × 100

weight = exp(-λ × age_hours)
λ = ln(2) / half_life_hours

With a 72-hour half-life:

Samples from 3 days ago have 50% weight
Samples from 6 days ago have 25% weight
Samples from 9 days ago have 12.5% weight

Normalization

Raw scores represent percentage of sampled time that the user was active. Since users don't work 24/7, this is normalized against expected work hours:

normalized_score = raw_score / (target_hours / 24)

Parameter	Environment Variable	Default	Description
Target hours	`JUPYTERHUB_ACTIVITYMON_TARGET_HOURS`	8h	Expected work hours per day for 100% score
Half-life	`JUPYTERHUB_ACTIVITYMON_HALF_LIFE`	72h	Decay half-life in hours

Example: User active 8h/day has raw score ~33% (8/24). With 8h target: 33% / 0.333 = 100%.

Minimum Data Requirement

Activity percentage is shown in tooltip only after 24 hours of data collection. Before that, tooltip displays "Not enough data (Nh of 24h collected)" to prevent misleading scores from incomplete data.

Minimum samples required: 86400 / sample_interval (144 samples at 10-minute intervals)

Wall-Clock vs Working Time

The decay applies continuously to calendar time, but users only work a fraction of each day. This creates a difference between configured half-life and effective half-life.

Why It Matters

Event	Calendar Age	Weight	Impact
Day 0: Work 9am-5pm	0h	1.000	+8h active
Day 0: Sleep overnight	16h	0.857	Decay, no new work
Day 1: Work 9am-5pm	24h	0.794	+8h active
Day 1: Sleep overnight	40h	0.680	Decay, no new work
Day 2: Work 9am-5pm	48h	0.630	+8h active
Day 3: Start of day	72h	0.500	Half-life point

Each overnight break causes ~15% decay with no new activity to offset it.

Effective Half-life Table

Effective half-life in working hours for different work patterns:

Work Pattern	Configured 24h	Configured 48h	Configured 72h
12h/day	12.0h	21.3h	26.8h
10h/day	10.0h	17.8h	22.3h
8h/day	8.0h	14.3h	18.0h
6h/day	6.0h	10.8h	13.5h
4h/day	4.0h	7.2h	9.0h
2h/day	2.0h	3.7h	4.5h

With 72-hour configured half-life, an 8h/day worker has 18 working hours (~2.25 work days) effective half-life. This prevents overnight breaks from aggressively penalizing scores.

User Interface

Status Indicator

Three-state indicator based on server and activity status:

Color	Status	Condition
Green	Active	Server running AND activity within inactive threshold
Yellow	Inactive	Server running BUT no recent activity
Red	Offline	Server not running

Activity Bar

5-segment bar visualization with color coding:

Segments Lit	Color	Meaning
4-5	Green	High activity (80-100%)
2-3	Yellow	Medium activity (40-79%)
1	Red	Low activity (1-39%)
0	Empty	No activity (0%)

Bar is capped at 100% (5 segments). Tooltip shows actual percentage, including values over 100% for users exceeding target hours.

Table Columns

Column	Description	Sortable
User	Username	Yes
Status	3-state indicator	Yes
CPU	Current CPU usage %	Yes
Memory	Current memory in MB/GB	Yes
Volumes	Total volume size (tooltip: per-volume breakdown)	Yes
Time Left	Remaining session time (if idle culler enabled)	Yes
Last Active	Relative time since last activity	Yes
Activity	Normalized activity score bar	Yes

Default sort: Status descending (active users first), then username ascending.

API Endpoints

GET /api/activity

Returns JSON array of all users with activity data:

{
  "users": [
    {
      "username": "konrad",
      "server_active": true,
      "recently_active": true,
      "cpu_percent": 12.5,
      "memory_mb": 1234,
      "memory_percent": 15.2,
      "volume_size_mb": 2048,
      "volume_breakdown": {"home": 512, "workspace": 1400, "cache": 136},
      "time_remaining_seconds": 85500,
      "activity_score": 33.2,
      "sample_count": 288,
      "last_activity": "2026-01-25T10:30:00Z"
    }
  ],
  "timestamp": "2026-01-25T11:00:00Z"
}

POST /api/activity/reset

Clears all activity samples. Admin only. Returns confirmation message.

Configuration Reference

All environment variables with defaults:

# Sampling
JUPYTERHUB_ACTIVITYMON_SAMPLE_INTERVAL=600      # seconds between samples
JUPYTERHUB_ACTIVITYMON_RETENTION_DAYS=7         # days to keep samples
JUPYTERHUB_ACTIVITYMON_INACTIVE_AFTER=60        # minutes until inactive

# Scoring
JUPYTERHUB_ACTIVITYMON_HALF_LIFE=72             # decay half-life in hours
JUPYTERHUB_ACTIVITYMON_TARGET_HOURS=8           # target hours for 100%

# UI refresh
JUPYTERHUB_ACTIVITYMON_RESOURCES_UPDATE_INTERVAL=10    # seconds
JUPYTERHUB_ACTIVITYMON_VOLUMES_UPDATE_INTERVAL=3600    # seconds

Implementation Files

File	Purpose
`services/jupyterhub/conf/bin/custom_handlers.py`	API handlers, sampling logic, score calculation
`services/jupyterhub/html_templates_enhanced/activity.html`	Admin UI template
`config/jupyterhub_config.py`	Handler registration, template vars
`services/jupyterhub/conf/settings_dictionary.yml`	Settings page definitions

Database Schema

SQLite database at /data/activity_samples.db:

CREATE TABLE activity_samples (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    is_active INTEGER NOT NULL,
    UNIQUE(username, timestamp)
);

CREATE INDEX idx_activity_username ON activity_samples(username);
CREATE INDEX idx_activity_timestamp ON activity_samples(timestamp);

Design Rationale

Why Exponential Decay?

Memory-efficient (no need to store weighted aggregates)
Naturally handles irregular work schedules
Smooths noise from sporadic activity
Configurable half-life adapts to different use cases

Why 72-hour Half-life?

Provides ~2.25 work days effective half-life for 8h/day workers
Doesn't aggressively penalize weekends or breaks
Balances recency with stability
Shorter half-life (24h) would over-penalize normal day boundaries

Why Normalization?

Raw 33% score for 8h/day worker is confusing
Normalized 100% matches user expectation of "full work day"
Allows scores > 100% for overtime visibility
Configurable target hours adapts to different work patterns

Why 24h Minimum Data?

Prevents misleading scores from partial data
Single work session would show artificially high/low score
Full day of data gives representative sample
Clear feedback to users about data collection progress

8.7 KiB Raw Blame History Unescape Escape