Real-World AI Task Horizons

Real-world AI task horizons measure how AI success rates decline as user-chosen tasks require more human time, capturing effective capability in deployed usage rather than only controlled benchmarks.

Key points

Anthropic relates its task-success primitive to task-horizon work such as METR’s measurement of how long a task an AI can reliably complete [src-069, src-070].
In first-party API data, success rates fall from around 60% for sub-hour tasks to roughly 45% for tasks estimated at 5+ human hours ^[src-069].
The API fitted line reaches 50% success at about 3.5 human hours, while Claude.ai extrapolates to about 19 hours because multi-turn conversations let users decompose and correct work [src-069, src-070].
Real-world task horizons mix model capability with user selection, setup cost, and user judgment about what is worth bringing to Claude [src-069, src-070].
Controlled benchmarks measure autonomous frontier capability; real-world usage measures effective task horizon across broader, user-selected work [src-069, src-070].
Anthropic’s scientific-computing case is a concrete long-horizon example: Claude Code worked over several days on a specialized numerical solver, using persistent memory, test oracles, Git coordination, and occasional steering ^[src-072].
The case distinguishes long-horizon work that can be autonomously pursued because progress is measurable from open-ended scientific discovery where human judgment remains central ^[src-072].

Related entities

Related concepts

Source references

^[src-069] Anthropic – “Anthropic Economic Index report: Economic primitives” (2026-01-15)
^[src-070] Anthropic – “Anthropic Economic Index: New building blocks for understanding AI use” (2026-01-15)
^[src-072] Siddharth Mishra-Sharma – “Long-running Claude for scientific computing” (2026-03-23)

Real-World AI Task Horizons

Real-World AI Task Horizons

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services