Autonomy Score

Agent Behavior

· 54 PRs with data in past 30d (116 total)

Count

Average

8.46

↑ 0.55 vs prior 30d

P10

3.23

↓ 0.07 vs prior 30d

P50

9.25

↑ 0.80 vs prior 30d

P90

13.40

↑ 1.00 vs prior 30d

Trend

Distribution

0.0–3.0

3.0–6.0

6.0–9.0

9.0–12.0

12.0–15.0

15.0–18.0

Notable PRs

Highest

#241Optimize bundle size with tree shaking15.00

#246Add custom domain support14.70

#162Optimize database query for dashboard14.70

Lowest

#231Add changelog generation script2.10

#129Optimize GraphQL resolver N+12.20

#186Fix scroll restoration on back nav2.30

About This Metric

Autonomy Score

What It Measures

The ratio of assistant messages to human messages across sessions correlated to a PR. This measures how much work the agent does independently for each human intervention.

Why It Matters

A core promise of agentic coding is that the developer provides high-level direction while the agent handles implementation details. Autonomy Score quantifies this: a score of 5.0 means the agent produces 5 messages for every 1 human message, indicating the agent is executing multi-step workflows with minimal hand-holding.

Low autonomy scores suggest the developer is micromanaging — issuing individual commands rather than letting the agent plan and execute. This could indicate trust issues, poor prompting habits, or tasks that genuinely require tight human oversight.

Tracking autonomy over time reveals whether developers are learning to delegate effectively to the agent.

How It's Calculated

autonomy_score = assistant_message_count / human_message_count

Summed across all sessions correlated to the PR. Returns null if there are no human messages.

Unlike Iteration Depth (which counts human turns as a raw number), Autonomy Score normalizes against the agent's work output — a session with 3 human turns and 30 agent turns is very different from 3 human turns and 5 agent turns.

Data Sources Required

Claude Code session data — Human message count and assistant message count per session.