Long-Running Scientific Agents
Long-running scientific agents are AI coding or research agents that work for hours or days on well-scoped scientific computing tasks with occasional human oversight, persistent memory, and explicit success criteria.
Key points
- Anthropic frames this as a shift from scientists micromanaging each conversational turn to setting a high-level objective and letting Claude Code work autonomously for long stretches [src-072].
- Good target tasks include reimplementing numerical solvers, converting legacy scientific software, and debugging large scientific codebases against a reference implementation [src-072].
- These tasks work because the scope is constrained, the success criteria are clear, and progress can be measured without constant human judgment [src-072].
- Deeply coupled scientific pipelines may need one sequential agent that reasons through causal chains and spawns subagents selectively, rather than a large swarm of independent parallel agents [src-072].
- The Boltzmann-solver example shows a non-domain expert using Claude to make sustained progress on specialized scientific infrastructure by relying on a plan, reference implementation, progress file, and test oracle [src-072].
- The practical implication is an idle-time reversal: if compute and clear tasks are available, not running agents overnight becomes an opportunity cost [src-072].
- Sio describes Codex slash-goal as a similar long-horizon mode: give the agent a hard objective and let it work for hours, days, or weeks until it can decide the goal is satisfied [src-081].
- Reported use cases include performance improvement, rewriting entire programs from one language to another, and math, physics, or science problems [src-081].
Related entities
Related concepts
- AI For Science
- Real-World AI Task Horizons
- Test Oracle Driven Agents
- Agent Progress File Memory
- Ralph Loop Orchestration
- Agentic Workflows
- Human-Agent Collaboration
- Codex (OpenAI)
- Everyday Agentic Work