Autonomy Score

Agent Behavior

13 PRs with data in past 30d (29 total)

Count

Average

11.04

↑ 2.78 vs prior 30d

P10

6.56

↑ 3.20 vs prior 30d

P50

12.40

↑ 4.60 vs prior 30d

P90

14.38

↑ 1.08 vs prior 30d

Trend

Distribution

4.0–6.0

6.0–8.0

8.0–10.0

10.0–12.0

12.0–14.0

14.0–16.0

Notable PRs

Highest

#176Refactor error boundary hierarchy14.70

#124Fix OAuth token refresh flow14.40

#220Refactor notification preferences14.30

Lowest

#192Fix dropdown z-index stacking4.10

#248Fix table sort state persistence6.50

#240Add feature flag management UI6.80

About This Metric

Autonomy Score

What It Measures

The ratio of assistant messages to human messages across sessions correlated to a PR. This measures how much work the agent does independently for each human intervention.

Why It Matters

A core promise of agentic coding is that the developer provides high-level direction while the agent handles implementation details. Autonomy Score quantifies this: a score of 5.0 means the agent produces 5 messages for every 1 human message, indicating the agent is executing multi-step workflows with minimal hand-holding.

Low autonomy scores suggest the developer is micromanaging — issuing individual commands rather than letting the agent plan and execute. This could indicate trust issues, poor prompting habits, or tasks that genuinely require tight human oversight.

Tracking autonomy over time reveals whether developers are learning to delegate effectively to the agent.

How It's Calculated

autonomy_score = assistant_message_count / human_message_count

Summed across all sessions correlated to the PR. Returns null if there are no human messages.

Unlike Iteration Depth (which counts human turns as a raw number), Autonomy Score normalizes against the agent's work output — a session with 3 human turns and 30 agent turns is very different from 3 human turns and 5 agent turns.

Data Sources Required

Claude Code session data — Human message count and assistant message count per session.