LLM Observability

LLM observability is the production telemetry layer for AI applications and agents, covering traces, costs, latency, model behavior, tool calls, retries, errors, and cross-service execution paths.

Key points

Datadog reports that agent framework adoption nearly doubled year over year, from more than 9 percent of organizations in early 2025 to almost 18 percent by early 2026 ^[src-037].
Frameworks such as LangChain, Pydantic AI, LangGraph, and Vercel AI SDK accelerate development but can hide tool fan-out, retries, branching, and inefficient imported logic ^[src-037].
Datadog argues that agent failures increasingly come from what teams cannot observe: agents need production feedback loops because LLM-driven control flow is harder to inspect than traditional software ^[src-037].
Comprehensive agent telemetry helps teams diagnose unexpected behavior, reproduce failures, understand actual execution paths, and decide when to replace framework boilerplate with bespoke workflows ^[src-037].
As agents move from monoliths toward dedicated services or multi-agent architectures, teams need distributed traces, context propagation, and service maps that include tools ^[src-037].
LLM observability connects quality, safety, performance, cost, and reliability into one operational picture rather than treating model output as a black box ^[src-037].
Google Cloud adds a governance layer: traces, logs, topology maps, Model Armor spans, and security dashboards should prove policy adherence and support agent forensics ^[src-043].
Agent observability must cover attempted violations and not only completed violations, because repeated attempts can reveal emerging bad behavior before it causes damage ^[src-043].
Prompt/response logs may need stricter access control than traces because they can contain sensitive user or business data ^[src-043].
The AI Engineer corpus adds an agent-specific observability arc: talks cover agent traces, eval-linked telemetry, MCP observability, production feedback loops, rogue-agent detection, support-agent reliability, and debugging multi-step execution rather than only logging prompts and responses ^[src-077].
Observability and evals increasingly merge: traces explain why an eval failed, while eval outcomes tell operators which traces and tool paths deserve investigation ^[src-077].
Fmind's MLOps course grounds observability in older ML operations: logging, monitoring, alerting, lineage, explainability, infrastructure visibility, costs, and KPIs are all needed to understand what a model system did and whether it is still acceptable ^[src-078].
This widens LLM observability back to the whole delivery chain: code version, data version, model registry entry, configuration, runtime environment, cost, latency, and user-visible behavior all matter ^[src-078].
Sierra's production voice-agent comments add a voice-specific observability surface: full-call traces, sensitive-information redaction, PCI-safe payment flow tracking, turn-taking evidence, and simulations that test whether the agent completed the customer task safely ^[src-083].
For voice agents, observability must include audio interaction quality as well as model/tool behavior, because latency, interruptions, spelling corrections, backchannels, and wrong actions can all break task completion ^[src-083].
The EU AI Act makes observability part of compliance for high-risk systems: systems must enable automatic event logs, deployers must retain logs when under their control, providers need post-market monitoring, and serious incidents can trigger reporting paths ^[src-085].
The Act's deployer-facing transparency and human-oversight requirements also imply observability that humans can use, not only telemetry that engineers can store ^[src-085].

Related entities

Related concepts

Source references

^[src-037] Datadog — "State of AI Engineering" (2026-04-21)
^[src-043] Google Cloud Events — "Operationalize AI: A blueprint for managing enterprise agents at scale" (2026-04-24)
^[src-077] AI Engineer channel transcript cluster (678 saved transcripts, 2023-10-20 to 2026-05-15)
^[src-078] Mederic Hurier (Fmind) channel transcript cluster (62 saved transcripts, 2024-11-26 to 2026-05-14)
^[src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)
^[src-085] European Parliament and Council of the European Union – "Regulation (EU) 2024/1689 … (Artificial Intelligence Act)" (2024-07-12)

LLM Observability

LLM Observability

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services