Datadog
Datadog is an observability platform represented in this wiki by its 2026 State of AI Engineering report, which analyzes customer LLM and agent telemetry to describe production AI engineering patterns.
Key facts
- Type: Observability and monitoring company
- Source role: Publisher of “State of AI Engineering” [src-037]
- Report date: 2026-04-21 [src-037]
- Dataset framing: Datadog analyzes customer LLM agent telemetry, LLM call spans, model usage, and error patterns from organizations running AI systems in production [src-037]
- Main thesis: Production AI systems are becoming multi-model, scaffolded, context-rich, capacity-constrained, and increasingly distributed, so engineering advantage depends on evaluation, observability, governance, resilience, and cost awareness [src-037]
- Concepts introduced here: Model Fleet Governance, LLM Observability, Prompt Caching for Agents, Context Quality Engineering, LLM Capacity Engineering [src-037]
What it adds
The report extends the wiki’s AI-agent coverage from build patterns into production operations. It quantifies multi-model adoption, model churn, agent-framework growth, prompt-token composition, prompt-caching underuse, context-window expansion, rate-limit failures, and early monolithic agent architectures [src-037].
Datadog’s framing is operational: agents are not only application logic, they are distributed systems with cost, latency, capacity, traceability, and governance problems. This connects agent design to classic observability concerns such as traces, service maps, backpressure, fallbacks, budgets, and error diagnosis [src-037].
Related concepts
- Model Fleet Governance
- LLM Observability
- Prompt Caching for Agents
- Context Quality Engineering
- LLM Capacity Engineering
- Agentic AI
- Agent Experimentation
- Context Engineering
Source references
- [src-037] Datadog — “State of AI Engineering” (2026-04-21)