Files
stellars-jupyterhub-ds/docs/activity-tracking-methodology.md
stellarshenson b7b3f0e87c docs: add half-life simulation tables for different work patterns
Added detailed simulation results showing how calendar half-life
translates to effective working-time decay:

- 10h/day (intensive): 72h -> 28.5 work hours at 50%
- 8h/day (typical): 72h -> 22.8 work hours at 50%
- 4h/day (part-time): 72h -> 11.5 work hours at 50%

Key finding: 72h calendar half-life consistently yields ~2.9 work
days at the 50% point, regardless of daily work hours. Activity
scores correctly reflect work fraction (8h/24h = 33.3%).
2026-01-25 11:54:54 +01:00

280 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Activity Tracking Methodology Research
## Current Implementation
Our current approach uses **exponential decay scoring**:
- Samples collected every 10 minutes (configurable)
- Each sample marked active/inactive based on `last_activity` within threshold
- Score calculated as weighted ratio: `weighted_active / weighted_total`
- Weight formula: `weight = exp(-λ × age_hours)` where `λ = ln(2) / half_life`
- Default half-life: 72 hours / 3 days (activity from 3 days ago worth 50%)
### Why 72-hour Half-life?
The decay applies to **wall-clock time**, not working time. Users work only a fraction of each 24-hour period, creating a mismatch between calendar decay and effective working activity decay.
#### Simulation Results
The following tables show how different calendar half-lives translate to effective working-time decay for various work patterns. "Work Hours at 50%" indicates how many actual working hours contribute 50% of the weighted activity score.
**10h/day work pattern** (intensive)
| Calendar Half-life | Work Hours at 50% | Work Days at 50% | Activity Score |
|:------------------:|:-----------------:|:----------------:|:--------------:|
| 24h (1d) | 10.0h | 1.0 days | 41.0% |
| 48h (2d) | 19.8h | 2.0 days | 41.5% |
| 72h (3d) | 28.5h | 2.9 days | 41.6% |
| 96h (4d) | 35.3h | 3.5 days | 41.6% |
| 168h (7d) | 47.7h | 4.8 days | 41.7% |
**8h/day work pattern** (typical)
| Calendar Half-life | Work Hours at 50% | Work Days at 50% | Activity Score |
|:------------------:|:-----------------:|:----------------:|:--------------:|
| 24h (1d) | 8.0h | 1.0 days | 32.7% |
| 48h (2d) | 16.0h | 2.0 days | 33.2% |
| 72h (3d) | 22.8h | 2.9 days | 33.3% |
| 96h (4d) | 28.3h | 3.5 days | 33.3% |
| 168h (7d) | 38.2h | 4.8 days | 33.3% |
**4h/day work pattern** (part-time)
| Calendar Half-life | Work Hours at 50% | Work Days at 50% | Activity Score |
|:------------------:|:-----------------:|:----------------:|:--------------:|
| 24h (1d) | 4.0h | 1.0 days | 16.3% |
| 48h (2d) | 8.0h | 2.0 days | 16.6% |
| 72h (3d) | 11.5h | 2.9 days | 16.6% |
| 96h (4d) | 14.2h | 3.5 days | 16.6% |
| 168h (7d) | 19.2h | 4.8 days | 16.7% |
#### Summary: 72-hour Half-life Across Work Patterns
| Work Pattern | Work Hours at 50% | Effective Work Days | Activity Score |
|:------------:|:-----------------:|:-------------------:|:--------------:|
| 10h/day | 28.5h | 2.9 days | 41.6% |
| 8h/day | 22.8h | 2.9 days | 33.3% |
| 4h/day | 11.5h | 2.9 days | 16.6% |
#### Key Insights
With a 72-hour calendar half-life:
- **Consistent ~3 work days** at the 50% point regardless of daily work hours
- Activity score reflects actual work fraction (8h/24h ≈ 33%, 4h/24h ≈ 17%)
- Overnight breaks don't aggressively penalize scores
- A 24-hour half-life would be too aggressive - yesterday's work already at 50% weight before today starts
## Industry Approaches
### 1. Exponential Moving Average (EMA) / Time-Decay Systems
**How it works:**
- Recent events weighted more heavily than older ones
- Decay factor (α) determines how quickly old data loses relevance
- Example: α=0.5 per day means yesterday's activity worth 50%, two days ago worth 25%
**Half-life parameterization:**
- More intuitive than raw decay factor
- "Activity has a 24-hour half-life" is clearer than "α=0.5"
- Our implementation already uses this approach
**Pros:**
- Memory-efficient (no need to store all historical data)
- Naturally handles irregular sampling intervals
- Smooths out noise/outliers
**Cons:**
- Older activity never fully disappears (asymptotic to zero)
- May not match user intuition of "weekly activity"
**Reference:** [Exponential Moving Averages at Scale](https://odsc.com/blog/exponential-moving-averages-at-scale-building-smart-time-decay-systems/)
---
### 2. Time-Window Activity Percentage (Hubstaff approach)
**How it works:**
- Fixed time window (e.g., 10 minutes)
- Count active seconds / total seconds = activity %
- Aggregate over day/week as average of windows
**Hubstaff's formula:**
```
Active seconds / 600 = activity rate % (per 10-min segment)
```
**Key insight from Hubstaff:**
> "Depending on someone's job and daily tasks, activity rates will vary widely. People with 75% scores and those with 25% scores can often times both be working productively."
**Typical benchmarks:**
- Data entry/development: 60-80% keyboard/mouse activity
- Research/meetings: 30-50% activity
- 100% is unrealistic for any role
**Pros:**
- Simple to understand
- Direct mapping to "how active was I today"
**Cons:**
- Doesn't capture quality of work
- Penalizes reading, thinking, meetings
**Reference:** [Hubstaff Activity Calculation](https://support.hubstaff.com/how-are-activity-levels-calculated/)
---
### 3. Productivity Categorization (RescueTime approach)
**How it works:**
- Applications/websites pre-categorized by productivity score (-2 to +2)
- Time spent in each category weighted and summed
- Daily productivity score = weighted sum / total time
**Categories:**
- Very Productive (+2): IDE, documentation
- Productive (+1): Email, spreadsheets
- Neutral (0): Uncategorized
- Distracting (-1): News sites
- Very Distracting (-2): Social media, games
**Pros:**
- Captures quality of activity, not just presence
- Customizable per user/role
**Cons:**
- Requires app categorization (complex to implement)
- Subjective classification
- Not applicable to JupyterLab (all activity is "productive")
**Reference:** [RescueTime Methodology](https://www.rescuetime.com/)
---
### 4. GitHub Contribution Graph (Threshold-based intensity)
**How it works:**
- Count contributions per day (commits, PRs, issues)
- Map counts to 4-5 intensity levels
- Levels based on percentiles of user's own activity
**Typical thresholds:**
```javascript
// Example from implementations
thresholds: [0, 10, 20, 30] // contributions per day
colors: ['#ebedf0', '#9be9a8', '#40c463', '#30a14e', '#216e39']
```
**Key insight:**
- Relative to user's own history (not absolute)
- Someone with 5 commits/day max sees different scale than 50 commits/day
**Pros:**
- Visual, intuitive
- Adapts to user's activity patterns
**Cons:**
- Binary daily view (no intra-day granularity)
- Doesn't show decay/trend
---
### 5. Daily Target Approach (8h = 100%)
**How it works:**
- Define expected activity hours per day (e.g., 8h)
- Actual active hours / expected hours = daily score
- Cap at 100% or allow overtime bonus
**Formula:**
```
Daily score = min(1.0, active_hours / 8.0) × 100
Weekly score = avg(daily_scores)
```
**Pros:**
- Maps directly to work expectations
- Easy to explain to users
**Cons:**
- Assumes consistent work schedule
- Doesn't account for part-time, weekends
- JupyterHub users may have variable schedules
---
## Recommendations for JupyterHub Activity Monitor
### Option A: Keep Current (EMA with decay)
Our current implementation is actually well-designed for the use case:
| Aspect | Current Implementation |
|--------|------------------------|
| Sampling | Every 10 min (configurable) |
| Active threshold | 60 min since last_activity |
| Decay | 72-hour (3-day) half-life |
| Score range | 0-100% |
| Visualization | 5-segment bar with color coding |
**Suggested improvements:**
1. Add tooltip showing actual score percentage
2. Document what the score represents
### Option B: Hybrid Daily + Decay
Combine daily activity percentage with decay:
```python
# Daily activity: hours active today / 8 hours (capped at 100%)
daily_score = min(1.0, active_hours_today / 8.0)
# Apply decay to historical daily scores
weekly_score = sum(daily_score[i] * exp(-λ * i) for i in range(7)) / 7
```
**Benefits:**
- More intuitive "8h = full day" concept
- Still decays older activity
### Option C: Simplified Presence-Based
For JupyterLab, activity mostly means "server running + recent kernel activity":
| Status | Points/day |
|--------|------------|
| Offline | 0 |
| Online, idle > 1h | 0.25 |
| Online, idle 15m-1h | 0.5 |
| Online, active < 15m | 1.0 |
Weekly score = sum of daily points / 7
---
## Decision Points
1. **What does "100% activity" mean for JupyterHub users?**
- Option: Active during all sampled periods in retention window
- Option: 8 hours of activity per day
- Option: Relative to user's own historical average
2. **How fast should old activity decay?**
- Current: 72-hour / 3-day half-life (balanced decay)
- Alternative: 24-hour half-life (aggressive decay)
- Alternative: 7-day half-life (weekly trend)
3. **Should weekends count differently?**
- Current: All days weighted equally
- Alternative: Exclude weekends from expected activity
---
## Sources
- [Exponential Moving Averages at Scale (ODSC)](https://odsc.com/blog/exponential-moving-averages-at-scale-building-smart-time-decay-systems/)
- [Exponential Smoothing (Wikipedia)](https://en.wikipedia.org/wiki/Exponential_smoothing)
- [Hubstaff Activity Calculation](https://support.hubstaff.com/how-are-activity-levels-calculated/)
- [How Time is Calculated in Hubstaff](https://support.hubstaff.com/how-is-time-tracked-and-calculated-in-hubstaff/)
- [RescueTime](https://www.rescuetime.com/)
- [EWMA Formula (Corporate Finance Institute)](https://corporatefinanceinstitute.com/resources/career-map/sell-side/capital-markets/exponentially-weighted-moving-average-ewma/)
- [Developer Productivity Metrics (Axify)](https://axify.io/blog/developer-productivity-metrics)