mirror of https://github.com/stellarshenson/stellars-jupyterhub-ds.git synced 2026-03-08 06:00:29 +00:00

Files

stellarshenson b7b3f0e87c docs: add half-life simulation tables for different work patterns

Added detailed simulation results showing how calendar half-life
translates to effective working-time decay:

- 10h/day (intensive): 72h -> 28.5 work hours at 50%
- 8h/day (typical): 72h -> 22.8 work hours at 50%
- 4h/day (part-time): 72h -> 11.5 work hours at 50%

Key finding: 72h calendar half-life consistently yields ~2.9 work
days at the 50% point, regardless of daily work hours. Activity
scores correctly reflect work fraction (8h/24h = 33.3%).

2026-01-25 11:54:54 +01:00

9.2 KiB

Raw Blame History

Activity Tracking Methodology Research

Current Implementation

Our current approach uses exponential decay scoring:

Samples collected every 10 minutes (configurable)
Each sample marked active/inactive based on last_activity within threshold
Score calculated as weighted ratio: weighted_active / weighted_total
Weight formula: weight = exp(-λ × age_hours) where λ = ln(2) / half_life
Default half-life: 72 hours / 3 days (activity from 3 days ago worth 50%)

Why 72-hour Half-life?

The decay applies to wall-clock time, not working time. Users work only a fraction of each 24-hour period, creating a mismatch between calendar decay and effective working activity decay.

Simulation Results

The following tables show how different calendar half-lives translate to effective working-time decay for various work patterns. "Work Hours at 50%" indicates how many actual working hours contribute 50% of the weighted activity score.

10h/day work pattern (intensive)

Calendar Half-life	Work Hours at 50%	Work Days at 50%	Activity Score
24h (1d)	10.0h	1.0 days	41.0%
48h (2d)	19.8h	2.0 days	41.5%
72h (3d)	28.5h	2.9 days	41.6%
96h (4d)	35.3h	3.5 days	41.6%
168h (7d)	47.7h	4.8 days	41.7%

8h/day work pattern (typical)

Calendar Half-life	Work Hours at 50%	Work Days at 50%	Activity Score
24h (1d)	8.0h	1.0 days	32.7%
48h (2d)	16.0h	2.0 days	33.2%
72h (3d)	22.8h	2.9 days	33.3%
96h (4d)	28.3h	3.5 days	33.3%
168h (7d)	38.2h	4.8 days	33.3%

4h/day work pattern (part-time)

Calendar Half-life	Work Hours at 50%	Work Days at 50%	Activity Score
24h (1d)	4.0h	1.0 days	16.3%
48h (2d)	8.0h	2.0 days	16.6%
72h (3d)	11.5h	2.9 days	16.6%
96h (4d)	14.2h	3.5 days	16.6%
168h (7d)	19.2h	4.8 days	16.7%

Summary: 72-hour Half-life Across Work Patterns

Work Pattern	Work Hours at 50%	Effective Work Days	Activity Score
10h/day	28.5h	2.9 days	41.6%
8h/day	22.8h	2.9 days	33.3%
4h/day	11.5h	2.9 days	16.6%

Key Insights

With a 72-hour calendar half-life:

Consistent ~3 work days at the 50% point regardless of daily work hours
Activity score reflects actual work fraction (8h/24h ≈ 33%, 4h/24h ≈ 17%)
Overnight breaks don't aggressively penalize scores
A 24-hour half-life would be too aggressive - yesterday's work already at 50% weight before today starts

Industry Approaches

1. Exponential Moving Average (EMA) / Time-Decay Systems

How it works:

Recent events weighted more heavily than older ones
Decay factor (α) determines how quickly old data loses relevance
Example: α=0.5 per day means yesterday's activity worth 50%, two days ago worth 25%

Half-life parameterization:

More intuitive than raw decay factor
"Activity has a 24-hour half-life" is clearer than "α=0.5"
Our implementation already uses this approach

Pros:

Memory-efficient (no need to store all historical data)
Naturally handles irregular sampling intervals
Smooths out noise/outliers

Cons:

Older activity never fully disappears (asymptotic to zero)
May not match user intuition of "weekly activity"

Reference: Exponential Moving Averages at Scale

2. Time-Window Activity Percentage (Hubstaff approach)

How it works:

Fixed time window (e.g., 10 minutes)
Count active seconds / total seconds = activity %
Aggregate over day/week as average of windows

Hubstaff's formula:

Active seconds / 600 = activity rate % (per 10-min segment)

Key insight from Hubstaff:

"Depending on someone's job and daily tasks, activity rates will vary widely. People with 75% scores and those with 25% scores can often times both be working productively."

Typical benchmarks:

Data entry/development: 60-80% keyboard/mouse activity
Research/meetings: 30-50% activity
100% is unrealistic for any role

Pros:

Simple to understand
Direct mapping to "how active was I today"

Cons:

Doesn't capture quality of work
Penalizes reading, thinking, meetings

Reference: Hubstaff Activity Calculation

3. Productivity Categorization (RescueTime approach)

How it works:

Applications/websites pre-categorized by productivity score (-2 to +2)
Time spent in each category weighted and summed
Daily productivity score = weighted sum / total time

Categories:

Very Productive (+2): IDE, documentation
Productive (+1): Email, spreadsheets
Neutral (0): Uncategorized
Distracting (-1): News sites
Very Distracting (-2): Social media, games

Pros:

Captures quality of activity, not just presence
Customizable per user/role

Cons:

Requires app categorization (complex to implement)
Subjective classification
Not applicable to JupyterLab (all activity is "productive")

Reference: RescueTime Methodology

4. GitHub Contribution Graph (Threshold-based intensity)

How it works:

Count contributions per day (commits, PRs, issues)
Map counts to 4-5 intensity levels
Levels based on percentiles of user's own activity

Typical thresholds:

// Example from implementations
thresholds: [0, 10, 20, 30]  // contributions per day
colors: ['#ebedf0', '#9be9a8', '#40c463', '#30a14e', '#216e39']

Key insight:

Relative to user's own history (not absolute)
Someone with 5 commits/day max sees different scale than 50 commits/day

Pros:

Visual, intuitive
Adapts to user's activity patterns

Cons:

Binary daily view (no intra-day granularity)
Doesn't show decay/trend

5. Daily Target Approach (8h = 100%)

How it works:

Define expected activity hours per day (e.g., 8h)
Actual active hours / expected hours = daily score
Cap at 100% or allow overtime bonus

Formula:

Daily score = min(1.0, active_hours / 8.0) × 100
Weekly score = avg(daily_scores)

Pros:

Maps directly to work expectations
Easy to explain to users

Cons:

Assumes consistent work schedule
Doesn't account for part-time, weekends
JupyterHub users may have variable schedules

Recommendations for JupyterHub Activity Monitor

Option A: Keep Current (EMA with decay)

Our current implementation is actually well-designed for the use case:

Aspect	Current Implementation
Sampling	Every 10 min (configurable)
Active threshold	60 min since last_activity
Decay	72-hour (3-day) half-life
Score range	0-100%
Visualization	5-segment bar with color coding

Suggested improvements:

Add tooltip showing actual score percentage
Document what the score represents

Option B: Hybrid Daily + Decay

Combine daily activity percentage with decay:

# Daily activity: hours active today / 8 hours (capped at 100%)
daily_score = min(1.0, active_hours_today / 8.0)

# Apply decay to historical daily scores
weekly_score = sum(daily_score[i] * exp(-λ * i) for i in range(7)) / 7

Benefits:

More intuitive "8h = full day" concept
Still decays older activity

Option C: Simplified Presence-Based

For JupyterLab, activity mostly means "server running + recent kernel activity":

Status	Points/day
Offline	0
Online, idle > 1h	0.25
Online, idle 15m-1h	0.5
Online, active < 15m	1.0

Weekly score = sum of daily points / 7

Decision Points

What does "100% activity" mean for JupyterHub users?
- Option: Active during all sampled periods in retention window
- Option: 8 hours of activity per day
- Option: Relative to user's own historical average
How fast should old activity decay?
- Current: 72-hour / 3-day half-life (balanced decay)
- Alternative: 24-hour half-life (aggressive decay)
- Alternative: 7-day half-life (weekly trend)
Should weekends count differently?
- Current: All days weighted equally
- Alternative: Exclude weekends from expected activity

9.2 KiB Raw Blame History Unescape Escape

Activity Tracking Methodology Research

Current Implementation

Why 72-hour Half-life?

Simulation Results

Summary: 72-hour Half-life Across Work Patterns

Key Insights

Industry Approaches

1. Exponential Moving Average (EMA) / Time-Decay Systems

2. Time-Window Activity Percentage (Hubstaff approach)

3. Productivity Categorization (RescueTime approach)

4. GitHub Contribution Graph (Threshold-based intensity)

5. Daily Target Approach (8h = 100%)

Recommendations for JupyterHub Activity Monitor

Option A: Keep Current (EMA with decay)

Option B: Hybrid Daily + Decay

Option C: Simplified Presence-Based

Decision Points

Sources

9.2 KiB

Raw Blame History