Back to Overview

Cache Hit Rate

Prompt Efficiency

· 54 PRs with data in past 30d (116 total)

Count
54
Average
61%
↑ 6% vs prior 30d
P10
34%
↑ 3% vs prior 30d
P50
67%
↑ 11% vs prior 30d
P90
84%
↑ 1% vs prior 30d

Trend

Distribution

20–30%
4
30–40%
5
40–50%
7
50–60%
7
60–70%
11
70–80%
12
80–90%
7
90–100%
1

Notable PRs

Highest
#231Add changelog generation script91%
#145Fix webhook signature verification89%
#172Add bulk action toolbar to list view86%
Lowest
#156Fix timezone handling in date picker25%
#107Add retry logic for webhook delivery26%
#161Fix memory leak in WebSocket handler28%

About This Metric

Cache Hit Rate

What It Measures

The proportion of input tokens that were served from the prompt cache rather than processed fresh. This is computed as the ratio of cache-read tokens to total input tokens (standard input + cache creation + cache read) across all sessions correlated to a PR.

Why It Matters

Anthropic's prompt caching reduces the cost of cached input tokens by 90%. A high cache hit rate means the model is reusing context efficiently across turns, directly reducing token costs. Teams with consistently low cache hit rates may be structuring sessions in ways that defeat caching — for example, frequently switching context or starting new sessions for related work.

Cache hit rate turns an opaque billing line item into an actionable optimization signal.

How It's Calculated

cache_hit_rate = cache_read_input_tokens / (input_tokens + cache_creation_input_tokens + cache_read_input_tokens)

Summed across all sessions correlated to the PR. Returns a value between 0.0 and 1.0. Returns null if there are no input tokens.

Data Sources Required

  • Claude Code session data — Token usage breakdowns per assistant message, including input_tokens, cache_creation_input_tokens, and cache_read_input_tokens.