Claude Code Token Economics
The token-accounting model of Claude Code sessions. Every turn re-reads the full conversation, claude.md, MCP tool definitions, system prompts, skill frontmatter, and memory files, so cost grows exponentially with message count. One tracked 100-message session spent 98.5% of its tokens rereading prior history. Peak-hour windows (8am-2pm ET weekdays) drain the 5-hour session budget faster than off-peak, and the prompt cache expires after 5 minutes of idle, causing post-break spikes. Agent workflows use 7-10x more tokens than single-agent sessions because sub-agents reload full context.
Key points
- Every turn re-reads entire conversation + claude.md + MCP tools + system prompts + skills
- Message 30 can cost 30x message 1 due to cumulative rereading
- Prompt cache expires at 5 minutes idle — short breaks can cause cost spikes
- Peak hours (8am-2pm ET weekdays) drain sessions faster than off-peak
- Sub-agents use 7-10x more tokens than a flat session because of reload
- Single MCP server can add ~18K tokens per message to context
- Loss-in-the-middle phenomenon: models ignore content mid-context, so bloat degrades quality AND costs more
- The 10-hour course teaches tokens and context before advanced builds, because Claude Code users need to understand that every tool, MCP server, project instruction, file, and conversation turn competes for the same working window [src-016]
- Datadog finds that 69 percent of input tokens in customer LLM traces were system-prompt tokens, showing that heavily scaffolded agents often spend most context on repeated instructions, policies, and tool guidance [src-037].
- Even where models support prompt caching, only 28 percent of LLM call spans showed cached-read input tokens, suggesting many teams are paying to reprocess reusable scaffolding [src-037].
- Datadog also finds request token counts rising sharply year over year as teams add conversation history, retrieved documents, tool outputs, and guardrails to agent prompts [src-037].
- Mornati's MCP-vs-CLI experiment isolates the Tool Schema Tax: Native GitHub MCP can be cheap per call, but expensive across a session because its schemas are re-read on prompts that never use GitHub [src-041].
- In the article's 20-prompt / 2-GitHub-operation model, Native GitHub MCP costs 61,654 tokens versus 448 for raw CLI, 968 for on-demand CLI plus skill, and 892 for a gateway MCP [src-041].
- Reiner Pope's serving lecture grounds token pricing in hardware: output tokens are expensive because decode is memory-bandwidth constrained, while prefill/input tokens can amortize memory cost over a longer pass [src-042].
- Cache-hit discounts can be read as the difference between retrieving stored KV Cache state and rematerializing it with another forward pass [src-042].
- Long-context surcharges reveal the point where KV-cache memory bandwidth starts dominating the serving cost curve [src-042].
- Google Cloud adds a governance angle: fiscal responsibility means agents should choose token, API, and MCP paths responsibly and avoid spending limited tokens on low-priority work [src-043].
- OpenAI's prompt-caching Build Hour turns this into an API-design checklist: keep static prompt prefixes stable, put dynamic context late, use allowed-tools gating instead of changing tool schemas, and choose prompt-cache keys that match the reuse scope [src-084].
- The same session says a prompt just below the 1,024-token cache threshold may be more expensive than a slightly longer prompt that can be cached, because repeated cached reads can dominate the extra prompt length [src-084].
- Jack Roberts's Agentic OS dashboard adds an operator-facing layer: track subscriptions, token/API-equivalent spend, model choice, time saved, and ROI so model selection becomes a management decision rather than a hidden habit [src-086].
- Nate's deployment comparison adds a billing boundary: Claude Code loops/routines may use subscription entitlements, while SDK-based Modal or trigger.dev agents usually move usage into API billing and explicit session/state architecture [src-086].
Related entities
Related concepts
- Claude Code Context Management Discipline
- Claude Code Plan Mode and Ultra Plan
- Progressive Context Loading (Skills)
- Prompt Caching for Agents
- Context Quality Engineering
- LLM Capacity Engineering
- Tool Schema Tax
- G/N Ratio Tool Selection
- Gateway MCP Pattern
- LLM Inference Economics
- Prefill vs Decode
- KV Cache Tiering
- Agent Budget Controls
- Agent Governance Framework
- Harness Engineering
- Agentic OS Dashboard
- Agent Deployment Modes
Source references
- [src-004] Nate Herk cluster — Nate Herk — Claude Code cluster (21 videos)
– Videos referenced: 49V-5Ock8LU, tXtCK66fPj8, T4fXb3sbJIo, zKBPwDpBfhs
- [src-016] Nate Herk — "Build & Sell with Claude Code (10+ Hour Course)" (2026-03-12)
- [src-037] Datadog — "State of AI Engineering" (2026-04-21)
- [src-041] Marco Mornati — "The Future of Agentic Tooling: MCP Servers vs. CLI A Data-Driven Comparison" (2026-04-27)
- [src-042] Dwarkesh Patel — "How GPT, Claude, and Gemini are actually trained and served – Reiner Pope" (2026-04-29)
- [src-043] Google Cloud Events — "Operationalize AI: A blueprint for managing enterprise agents at scale" (2026-04-24)
- [src-084] OpenAI Codex, Workspace Agents, Prompt Caching, and Superintelligence Policy cluster (2026-02-09 to 2026-05-08)
- [src-086] Agent deployment, OpenClaw trading, n8n Desk, and Agentic OS cluster (2026-04-09 to 2026-05-15)