Self-Checking Todo Loops

Self-Checking Todo Loops

A Claude Code execution pattern where the agent maintains an explicit todo list, runs verification steps after each meaningful change, reads the result, patches failures, and repeats until the task is actually complete.

Key points

  • The loop is not just “make a todo list”; the important part is attaching observable checks to the list items [src-011]
  • Browser automation makes the loop stronger because Playwright can verify real UI behaviour rather than relying on code inspection alone [src-011]
  • The agent should treat failed tests, screenshots, logs, or browser errors as feedback to update the implementation and the todo list [src-011]
  • This pattern reduces premature completion claims, especially in frontend and browser-automation tasks [src-011]
  • Anthropic’s scientific-computing workflow generalizes the same idea to research code: the agent needs a test oracle such as a reference implementation, quantifiable objective, or unit test suite to know whether it is improving [src-072].
  • For long-running scientific tasks, the agent should expand tests while working so it does not overfit to a narrow parameter point and miss regressions elsewhere [src-072].

Related entities

Related concepts

Source references

  • [src-011] Nate Herk — Claude Code power features cluster (2026-04-20 to 2026-04-27)
  • [src-072] Siddharth Mishra-Sharma – “Long-running Claude for scientific computing” (2026-03-23)