The Largest Study of How Humans Actually Trust AI
We talk endlessly about what AI can do. Anthropic just published the first major empirical study of what humans actually let it do.
Using their privacy-preserving analysis tool Clio, Anthropic studied millions of real human-agent interactions across Claude Code (their coding agent) and their public API. Not a lab experiment with 30 undergrads — real users, real work, real stakes.
The central finding is deceptively simple: AI agents are already more capable than humans allow them to be. The gap between what AI can handle and what humans let it handle — what Anthropic calls the "deployment overhang" — is massive. And it's closing fast.
The Deployment Overhang: A 7x Trust Gap
METR, an independent AI evaluation organization, estimates that Claude Opus 4.5 can complete tasks with 50% success rate that would take a human nearly 5 hours.
In practice? The longest Claude Code sessions (99.9th percentile) run about 42 minutes.
That's a 7x gap between capability and deployment. Like buying a car that goes 200 mph and driving it at 30.
The autonomous session duration has been growing — nearly doubling from under 25 minutes to over 45 minutes between October 2025 and January 2026. But here's the critical insight: the growth is smooth across model releases. If autonomy were driven by capability, you'd see sharp jumps when a new model drops. The smooth curve means something else is happening: humans are gradually relaxing their supervision. This is a finding about trust, not technology.
Anthropic's internal data tells the same story from the other side. Between August and December, Claude Code's success rate on their team's hardest tasks doubled, while human interventions per session dropped from 5.4 to 3.3. Better results, less babysitting.
The Trust Paradox: More Freedom AND More Interruptions
How do people adapt to working with AI agents over time? The data reveals a counterintuitive pattern:
| Metric | New Users (<50 sessions) | Experienced Users (750+) |
|---|---|---|
| Auto-approve rate | ~20% | 40%+ |
| Interrupt rate | 5% of turns | 9% of turns |
Experienced users grant twice as much autonomy but also interrupt almost twice as often. That sounds contradictory. It's not.
It reflects a fundamental shift in supervisory strategy:
Novice pattern: Micromanage everything. Approve each step manually. Rarely need to interrupt because you're already controlling every move. This is pre-trust gatekeeping.
Expert pattern: Let it run freely, monitor outcomes, intervene surgically. High autonomy, high interruption. This is post-trust oversight — the same way a senior engineer manages a junior: "Go do it, I'll check in and course-correct."
The experts aren't checking less because they're lazy. They've developed mental models of failure modes — they know when Claude is likely to go wrong and interrupt preemptively. The top reason for human interruption? "Providing missing technical context" (32%). Experts recognize context gaps that novices can't even identify.
The AI Is More Cautious Than the Humans
Here's the finding that should reshape how we think about AI safety:
On the most complex tasks, Claude Code stops to ask for clarification more than twice as often as humans interrupt it.
The AI is currently the more cautious partner in the relationship.
| Why Claude stops itself | % |
|---|---|
| Present choices between approaches | 35% |
| Gather diagnostic info or test results | 21% |
| Clarify vague or incomplete requests | 13% |
| Request missing credentials/access | 12% |
| Get approval before risky action | 11% |
| Why humans interrupt Claude | % |
|---|---|
| Provide missing technical context | 32% |
| Claude was slow, hanging, or excessive | 17% |
| Got enough help to proceed alone | 7% |
| Want to take next step manually | 7% |
| Change requirements mid-task | 5% |
Claude's top reason for stopping is presenting the user with a choice — not confusion, not failure, but genuine decision points where human judgment matters. This is consent-seeking behavior, likely shaped by Anthropic's training. It's a feature, not a bug.
But there's a tension building: as users grow more trusting, they may start finding Claude's caution annoying, creating pressure to train future models to ask fewer questions. The very safety behavior that makes agents trustworthy could be eroded by the trust it creates.
What Agents Are Actually Doing
Software engineering dominates: Nearly 50% of all agentic API activity is code-related. But the diversification signal is real:
- Business intelligence and data analysis
- Customer service automation
- Sales workflows
- Finance and e-commerce
- Emerging: Healthcare, cybersecurity
The safety profile is reassuring for now: 80% of tool calls have some safeguard, 73% have a human in the loop, and only 0.8% of actions are irreversible (like sending an email or executing a trade).
The highest-risk clusters include API key exfiltration (likely red-teaming/security research), medical record retrieval, cryptocurrency trading, and lab chemical management. Most appear to be evaluations rather than production deployments — but the tooling exists, and production use is a matter of when, not if.
27% of AI Work Didn't Exist Before
Buried in the research is a finding that undermines the entire "AI replaces jobs" narrative:
27% of AI-assisted work consists of tasks that wouldn't have been done otherwise.
These aren't tasks that AI took from humans. They're tasks that became feasible because AI made them cheap enough to justify:
- Technical debt that was always deprioritized — now systematically eliminated
- Exploratory projects nobody had time for — now executed
- Quality-of-life improvements that were "nice to have" — now actually built
- Security audits that were too expensive — now routine
This reframes the economic impact. Standard productivity measurements capture "time saved on existing tasks." They miss the entire category of newly-feasible work. AI isn't just making the pie-cutting more efficient — it's making a bigger pie.
Uncommon Insights
The following insights are informed, educated guesses drawn from multiple AI analyses — not established facts. They represent the kind of non-obvious thinking that experienced observers would apply to this story.
- The overhang will close through competitive pressure, not capability gains. The first company that lets Claude run unsupervised for 5 hours on engineering tasks gets a massive productivity edge. Others follow or fall behind. The deployment overhang closes not through better AI but through trust normalization driven by market competition. This is the most important near-term dynamic in enterprise AI.
- The trust calibration skill becomes as important as Googling was in 2005. Experienced users aren't better at telling AI what to do — they're better at knowing when to check on it. This is a management skill, and it's about to become universal. Companies that train employees on AI supervision will outperform those that just give them access.
- Anthropic is building the audit infrastructure that wins enterprise. This research isn't academic — it's the measurement framework enterprises need before deploying agents in regulated industries. When a healthcare CISO asks "prove your AI is safe," Anthropic can point to empirical data: 0.8% irreversible actions, 80% safeguard coverage, AI self-interrupts more than humans. OpenAI can't match this yet.
- The safety data comes almost entirely from coding. The 0.8% irreversibility rate is reassuring, but software is inherently reversible (git revert). Healthcare, finance, and legal work aren't. As agents enter these domains, the risk profile changes fundamentally. Current safety stats may not transfer.
- Claude's consent-seeking behavior creates a paradox. The model's trained caution builds trust, but trust erodes the demand for caution. As users reach 40%+ auto-approve, they're signaling "stop asking me." Future training will face pressure to make models less hesitant — exactly the opposite of what the safety findings recommend.
- 0.8% irreversible sounds small until you do the math at scale. If Anthropic processes a billion tool calls, that's 8 million irreversible actions. The percentage is low; the absolute number is not. As agent adoption grows 10-100x, the tail risk matters more than the percentage.
- The smooth autonomy growth curve is the strongest evidence against AI doomer timelines. If humans were being overwhelmed by AI capability, you'd see chaotic, discontinuous adoption patterns. Instead, you see a steady, trust-based ramp that suggests humans are calibrating appropriately — at least for now.
- Anthropic studying their own product has an inherent conflict. They benefit from the story being "agents are safe, give them more autonomy." The findings are plausible and well-presented, but independent replication by a neutral party would strengthen the conclusions considerably.
What Happens Next
Near-term (0-6 months):
- Autonomous session durations push toward 2-4 hours as power users close the overhang
- "Agent reliability scores" become an enterprise procurement standard
- Anthropic leverages this research for regulated-industry sales (healthcare, finance)
Medium-term (6-18 months):
- Pricing shifts from token-based to task-based or time-based as sessions get longer
- Vertical-specific agents emerge: scoped for security audits, compliance, code migration, clinical data
- The first high-profile autonomous agent failure triggers regulatory debate — Anthropic's measurement framework becomes their insurance policy
The big question: What happens when the cautious partner in the human-AI relationship stops being cautious? The entire trust architecture described here depends on Claude asking for permission. If competitive pressure or user demand trains that behavior out, the human supervisory patterns documented here — 40%+ auto-approve, 5% interrupt rate for novices — become the only safety layer. And we've just been shown exactly how thin that layer is.
Further Reading
- Measuring AI Agent Autonomy in Practice — Anthropic — The full research post with methodology, all figures, and recommendations for developers and policymakers
- Clio: Privacy-Preserving Insights into Real-World AI Use — Anthropic — The privacy-preserving analysis tool used to study millions of interactions without accessing individual conversations
- METR: Measuring AI Ability to Complete Long Tasks — METR — The independent evaluation showing Claude Opus 4.5 can handle 5-hour tasks — the capability side of the deployment overhang
- Anthropic's Framework for Safe and Trustworthy Agents — Anthropic — The broader agent safety framework that this empirical research supports
- Claude Code Documentation — Anthropic — Documentation for the coding agent studied in this research
- Disrupting AI-Enabled Espionage — Anthropic — Context on the high-risk end of the spectrum — agents used for cyber espionage that Anthropic detected and disrupted