Long-Running Scientific Agents

Long-Running Scientific Agents

Long-running scientific agents are AI coding or research agents that work for hours or days on well-scoped scientific computing tasks with occasional human oversight, persistent memory, and explicit success criteria.

Key points

  • Anthropic frames this as a shift from scientists micromanaging each conversational turn to setting a high-level objective and letting Claude Code work autonomously for long stretches [src-072].
  • Good target tasks include reimplementing numerical solvers, converting legacy scientific software, and debugging large scientific codebases against a reference implementation [src-072].
  • These tasks work because the scope is constrained, the success criteria are clear, and progress can be measured without constant human judgment [src-072].
  • Deeply coupled scientific pipelines may need one sequential agent that reasons through causal chains and spawns subagents selectively, rather than a large swarm of independent parallel agents [src-072].
  • The Boltzmann-solver example shows a non-domain expert using Claude to make sustained progress on specialized scientific infrastructure by relying on a plan, reference implementation, progress file, and test oracle [src-072].
  • The practical implication is an idle-time reversal: if compute and clear tasks are available, not running agents overnight becomes an opportunity cost [src-072].
  • Sio describes Codex slash-goal as a similar long-horizon mode: give the agent a hard objective and let it work for hours, days, or weeks until it can decide the goal is satisfied [src-081].
  • Reported use cases include performance improvement, rewriting entire programs from one language to another, and math, physics, or science problems [src-081].

Related entities

Related concepts

Source references

  • [src-072] Siddharth Mishra-Sharma – "Long-running Claude for scientific computing" (2026-03-23)
  • [src-081] OpenAI — "Codex for Everyday Work: AI Agents Beyond Coding" (2026-05-14)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 494 indexed pages and articles.

  1. Wiki concept Test Oracle Driven Agents Agents whose long-running work is guided by a reference implementation, quantified objective, or test suite that lets Related by scientific
  2. Wiki concept Real-World AI Task Horizons Measure how AI success rates decline as user-chosen tasks require more human time, capturing effective capability in deployed Related by scientific
  3. Insight Generative Engine Optimization for AI Search A practical GEO guide for becoming visible in AI-generated answers through machine-scannable content, authority, schema, and monitoring Related by criteria