Long-Running Scientific Agents
Long-running scientific agents are AI coding or research agents that work for hours or days on well-scoped scientific computing tasks with occasional human oversight, persistent memory, and explicit success criteria.
Key points
- Anthropic frames this as a shift from scientists micromanaging each conversational turn to setting a high-level objective and letting Claude Code work autonomously for long stretches [src-072].
- Good target tasks include reimplementing numerical solvers, converting legacy scientific software, and debugging large scientific codebases against a reference implementation [src-072].
- These tasks work because the scope is constrained, the success criteria are clear, and progress can be measured without constant human judgment [src-072].
- Deeply coupled scientific pipelines may need one sequential agent that reasons through causal chains and spawns subagents selectively, rather than a large swarm of independent parallel agents [src-072].
- The Boltzmann-solver example shows a non-domain expert using Claude to make sustained progress on specialized scientific infrastructure by relying on a plan, reference implementation, progress file, and test oracle [src-072].
- The practical implication is an idle-time reversal: if compute and clear tasks are available, not running agents overnight becomes an opportunity cost [src-072].
- Sio describes Codex slash-goal as a similar long-horizon mode: give the agent a hard objective and let it work for hours, days, or weeks until it can decide the goal is satisfied [src-081].
- Reported use cases include performance improvement, rewriting entire programs from one language to another, and math, physics, or science problems [src-081].
Related entities
Related concepts
- AI For Science
- Real-World AI Task Horizons
- Test Oracle Driven Agents
- Agent Progress File Memory
- Ralph Loop Orchestration
- Agentic Workflows
- Human-Agent Collaboration
- Codex (OpenAI)
- Everyday Agentic Work
Source references
- [src-072] Siddharth Mishra-Sharma – "Long-running Claude for scientific computing" (2026-03-23)
- [src-081] OpenAI — "Codex for Everyday Work: AI Agents Beyond Coding" (2026-05-14)
Recommended next
Keep reading from this thread
From 494 indexed pages and articles.
- Wiki concept Test Oracle Driven Agents Agents whose long-running work is guided by a reference implementation, quantified objective, or test suite that lets Related by scientific
- Wiki concept Real-World AI Task Horizons Measure how AI success rates decline as user-chosen tasks require more human time, capturing effective capability in deployed Related by scientific
- Insight Generative Engine Optimization for AI Search A practical GEO guide for becoming visible in AI-generated answers through machine-scannable content, authority, schema, and monitoring Related by criteria