Peak Context Window %
What It Measures
The highest proportion of the model's context window used by any single message within a session. This captures how close the agent came to hitting its context limit during a coding task.
Why It Matters
Context window exhaustion is one of the most common failure modes in agentic coding. When the agent approaches its context limit, it may lose access to earlier conversation history, produce lower-quality responses, or trigger expensive context compaction. Tracking how close sessions get to the limit helps teams identify tasks that are pushing the boundaries of the model's capacity.
High peak context usage can indicate sessions that are too long, tasks that require too much code context, or prompting patterns that accumulate unnecessary context. Teams can use this metric to decide when to break tasks into smaller pieces, when to start fresh sessions, or when to upgrade to models with larger context windows.
How It's Calculated
The CLI computes this per-session during parsing:
For each assistant message in the session:
msg_context = input_tokens + cache_creation_input_tokens + cache_read_input_tokens
peak_context_tokens = MAX(msg_context) across all messages
peak_context_pct = peak_context_tokens / model_max_context_tokens
The denominator uses model-specific context limits:
- Most models (Opus 4.6, Sonnet 4.5/4.6, Haiku 4.5): 200,000 tokens
- Extended context variants (detected by
[1m]suffix in model ID): 1,000,000 tokens
The CLI computes peak_context_pct as a float between 0.0 and 1.0 and sends it to the server. The server stores and aggregates it directly — no server-side model lookup is needed.
Displayed as a percentage (e.g., 0.72 → 72%).
Data Sources Required
- Claude Code session data — Per-message token usage breakdowns (
input_tokens,cache_creation_input_tokens,cache_read_input_tokens) and the model identifier for context limit lookup.