Long-Running Scientific Agents

Long-Running Scientific Agents

Long-running scientific agents are AI coding or research agents that work for hours or days on well-scoped scientific computing tasks with occasional human oversight, persistent memory, and explicit success criteria.

Key points

  • Anthropic frames this as a shift from scientists micromanaging each conversational turn to setting a high-level objective and letting Claude Code work autonomously for long stretches [src-072].
  • Good target tasks include reimplementing numerical solvers, converting legacy scientific software, and debugging large scientific codebases against a reference implementation [src-072].
  • These tasks work because the scope is constrained, the success criteria are clear, and progress can be measured without constant human judgment [src-072].
  • Deeply coupled scientific pipelines may need one sequential agent that reasons through causal chains and spawns subagents selectively, rather than a large swarm of independent parallel agents [src-072].
  • The Boltzmann-solver example shows a non-domain expert using Claude to make sustained progress on specialized scientific infrastructure by relying on a plan, reference implementation, progress file, and test oracle [src-072].
  • The practical implication is an idle-time reversal: if compute and clear tasks are available, not running agents overnight becomes an opportunity cost [src-072].
  • Sio describes Codex slash-goal as a similar long-horizon mode: give the agent a hard objective and let it work for hours, days, or weeks until it can decide the goal is satisfied [src-081].
  • Reported use cases include performance improvement, rewriting entire programs from one language to another, and math, physics, or science problems [src-081].

Related entities

Related concepts

Source references

  • [src-072] Siddharth Mishra-Sharma – "Long-running Claude for scientific computing" (2026-03-23)
  • [src-081] OpenAI — "Codex for Everyday Work: AI Agents Beyond Coding" (2026-05-14)