Agentic AI

Agentic AI

A category of artificial intelligence in which a system autonomously breaks down a complex goal, plans a sequence of actions, uses external tools, and executes tasks end-to-end — with little to no human intervention at each step. In short: if traditional AI answers questions, agentic AI gets things done [src-003].

Key points

  • The distinguishing feature is autonomous multi-step action [src-003]. A standard LLM responds to one prompt and stops. An agentic system reasons, decides, acts, observes, and iterates in a continuous loop until a goal is achieved or a stop condition fires.
  • Four load-bearing components make up any agentic system [src-003]:

1. A large language model as the reasoning engine

2. A set of tools the model can invoke (web search, code execution, database queries, API calls)

3. A memory system (short-term via context window, long-term via vector databases or files)

4. An orchestration layer that runs the agent loop and manages tool calls

  • Not all generative AI is agentic, but all agentic AI uses generative AI [src-003]. Generative AI produces content from a prompt. Agentic AI uses a generative model as its reasoning engine, then adds tool use, multi-step execution, and autonomy on top.
  • Four impact domains in 2026 [src-003]: data & analytics (autonomous SQL + reporting), digital marketing (research, drafting, scheduling, budget reallocation), software development (Claude Code, Devin, Copilot Workspace), and business automation (n8n, Zapier AI).
  • Framework landscape includes Anthropic Claude, Claude Code, LangGraph, AutoGen (Microsoft), OpenAI Agents SDK, n8n, Zapier AI, and the Model Context Protocol (MCP) as the emerging open standard for tool connectivity [src-003].
  • Statsig adds an experimentation view: agents are multi-step systems where changing one component in a single node can affect downstream performance, cost, latency, and product outcomes, so they are natural candidates for Agent Experimentation [src-032].
  • Datadog adds a production operations view: agent framework adoption has nearly doubled, but frameworks increase the need for LLM Observability because tool fan-out, retries, branching, and hidden control flow can make failures harder to reproduce [src-037].
  • Datadog also finds that many production agents remain monolithic, while the move toward dedicated agent services and multi-agent architectures will require distributed traces, context propagation, and tool-aware service maps [src-037].
  • Capacity is a first-class agent risk: ReAct-style loops, retries, and collaborative agents can hit provider rate limits, so production systems need LLM Capacity Engineering and Agent Budget Controls [src-037].
  • Google adds a user-interface dimension: agents can now generate UI intent through A2UI, letting client apps render adaptive interfaces from their own component catalogs [src-038].
  • In this pattern, an agent's output is not only text or tool calls; it can be a validated, streamable interface composed of user-facing widgets [src-038].
  • Google DeepMind adds an embodied-robotics dimension: Gemini Robotics-ER 1.6 acts as a high-level reasoning model for physical agents, calling tools, vision-language-action models, or user functions while reasoning about spatial scenes and task completion [src-039].
  • This expands agentic AI from software and UI workflows into robots that must perceive, plan, act, detect success, and obey physical safety constraints [src-039].
  • Google Cloud's enterprise-agent session maps agent adoption into four stages: productivity tools, delegated workflows, autonomous agents with identity and authority, and dynamic swarms/teams with ephemeral workers [src-043].
  • The same source argues that autonomous agents should be treated as having identity and authority, which makes Enterprise Agent Governance a prerequisite for production deployment [src-043].
  • [src-061] adds a model-training view: reinforcement learning with verifiable rewards and Inference-Time Scaling made tool use, CLI work, API exploration, and software-engineering agent behavior feel qualitatively different.
  • The same source points toward Agentic Context Management, where future agents learn when to compact or index history as part of acting over long tasks instead of only relying on larger context windows [src-061].
  • [src-062] adds a device/OS view: Pichai expects mobile and XR systems to become more agentic, understanding user goals and repeated behavior at the operating-system level rather than only inside separate apps.
  • [src-064] adds a system-level personal-agent view: OpenClaw lives on the user's computer, talks through messaging clients, uses local tools and APIs, and turns broad computer access into both the source of usefulness and the source of security risk.
  • [src-065] adds the infrastructure view: Jensen Huang says NVIDIA Vera Rubin was designed for agents because agents call tools and create different storage, CPU, networking, and rack-level demands than LLM-only inference.
  • [src-081] adds an everyday-work view: Codex (OpenAI) can use code, file access, plugins, computer use, and scheduled tasks to create documents, spreadsheets, slide decks, websites, data analyses, and personal workflow automations for non-coders.
  • The same source emphasizes that long-horizon agents need clear success criteria: the user should describe what done looks like so the agent can evaluate whether it has satisfied the goal [src-081].

Risks and mitigations [src-003]

Risk What it means Mitigation
Compounded hallucinations An error in step 2 propagates through every subsequent step. One hallucination → completely wrong final output or real-world damage. Early stopping, human checkpoints on high-stakes decisions
Security and prompt injection Broad tool access = new attack surface. Instructions embedded in a web page or document can hijack the agent. Input validation, least-privilege tool access, sandboxing
Cost and latency Multi-step runs consume significant tokens and take minutes. Efficient loops, caching, early stopping
Human oversight When to pause for confirmation vs act autonomously is an unsolved problem. Human-in-the-loop checkpoints for high-stakes steps, full autonomy for low-risk sub-tasks

The professional skill shift

"The most important professional skill in this era is no longer just knowing how to prompt an AI — it is knowing how to design, orchestrate, and supervise AI agents." [src-003]

Related entities

  • Claude Code — Anthropic's agentic CLI, used as the runtime for many multi-agent tools
  • Paperclip — orchestration layer that turns Claude Code into an agentic company

Related concepts

Updates from bulk ingest

From src-005 (cluster 2)

  • Agentic AI market valued at $5B in 2024, projected to hit ~$200B by 2034
  • 96% of enterprises plan to expand agentic AI usage in the coming year; by 2028, a third of enterprise software expected to have agentic AI built in
  • Deloitte: 25% of enterprises using generative AI will deploy agentic pilots this year, jumping to 50% by 2027
  • AI agent market projected to grow from ~$8B in 2025 to $48-52B by 2030 (43% CAGR)
  • Google launched A2A (Agent-to-Agent) protocol in April 2025 with support from Salesforce, SAP, ServiceNow, Workday, and 50+ enterprise partners
  • A2A defines agent cards (capability descriptions), shared task lifecycles, and secure context-sharing between agents from different vendors
  • Benchmarks like Vending Bench show that even strong reasoning models degrade over long runs – forgetting orders, mistracking inventory, falling into loops
  • Anthropic developing 'agent harnesses' with shift-based work, where agents hand off structured artifacts (notes, to-dos, diffs) to the next shift instead of keeping everything in one context window

Source references

  • [src-001] Nate Herk — "Claude Code + Paperclip Just Destroyed OpenClaw" (2026-03-28)
  • [src-003] Robin Cartier — "What is Agentic AI? A Complete Guide" (2026-03-10)
  • [src-005] Nate Herk cluster (see summaries/src-005-*.md)
  • [src-032] Skye Scofield and Sid Kumar — "Experimentation and AI: 4 trends we’re seeing" (2025-06-13)
  • [src-037] Datadog — "State of AI Engineering" (2026-04-21)
  • [src-038] Google A2UI Team — "A2UI v0.9: The New Standard for Portable, Framework-Agnostic Generative UI" (2026-04-17)
  • [src-039] Laura Graesser and Peng Xu — "Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning" (2026-04-14)
  • [src-043] Google Cloud Events — "Operationalize AI: A blueprint for managing enterprise agents at scale" (2026-04-24)
  • [src-061] Lex Fridman – "State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490" (2026-01-31)
  • [src-062] Lex Fridman – "Sundar Pichai: CEO of Google and Alphabet | Lex Fridman Podcast #471" (2025-06-05)
  • [src-064] Lex Fridman – "OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger | Lex Fridman Podcast #491" (2026-02-12)
  • [src-065] Lex Fridman – "Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494" (2026-03-23)
  • [src-081] OpenAI — "Codex for Everyday Work: AI Agents Beyond Coding" (2026-05-14)