Cache Hit Rate

Prompt Efficiency

13 PRs with data in past 30d (29 total)

Count

Average

65%

↑ 2% vs prior 30d

P10

36%

↓ 8% vs prior 30d

P50

71%

↑ 10% vs prior 30d

P90

84%

↓ 5% vs prior 30d

Trend

Distribution

30–40%

40–50%

50–60%

60–70%

70–80%

80–90%

Notable PRs

Highest

#196Add custom domain support86%

#208Improve search result ranking84%

#120Refactor notification preferences82%

Lowest

#240Add feature flag management UI30%

#244Refactor API client error handling32%

#176Refactor error boundary hierarchy52%

About This Metric

Cache Hit Rate

What It Measures

The proportion of input tokens that were served from the prompt cache rather than processed fresh. This is computed as the ratio of cache-read tokens to total input tokens (standard input + cache creation + cache read) across all sessions correlated to a PR.

Why It Matters

Anthropic's prompt caching reduces the cost of cached input tokens by 90%. A high cache hit rate means the model is reusing context efficiently across turns, directly reducing token costs. Teams with consistently low cache hit rates may be structuring sessions in ways that defeat caching — for example, frequently switching context or starting new sessions for related work.

Cache hit rate turns an opaque billing line item into an actionable optimization signal.

How It's Calculated

cache_hit_rate = cache_read_input_tokens / (input_tokens + cache_creation_input_tokens + cache_read_input_tokens)

Summed across all sessions correlated to the PR. Returns a value between 0.0 and 1.0. Returns null if there are no input tokens.

Data Sources Required

Claude Code session data — Token usage breakdowns per assistant message, including input_tokens, cache_creation_input_tokens, and cache_read_input_tokens.