Voice Agents

Voice Agents

AI agents whose primary interface is spoken conversation over a phone call, app, or in-browser audio stream. Earlier examples combine a transcriber, LLM brain, voice synthesiser, and tool layer; newer realtime audio models such as GPT Realtime 2 collapse more of that loop into a native audio model that can reason, preserve context, and call tools.

Key points

  • Two top-level shapes: inbound (receptionist answering calls) and outbound (agent placing calls)
  • Front-end platform (Vapi) handles the voice stack; back-end workflow engine (n8n) handles deterministic business logic
  • System prompt is the single biggest lever on quality — expect dozens of iterations in production
  • Voice agents should self-identify as AI on the opening turn as a baseline ethical practice
  • Wireframing the conversational flow before building is strongly recommended — conditional logic branches quickly
  • OpenAI's GPT Realtime 2 demo shows a voice agent checking a calendar, staying silently in context until reactivated, and updating a CRM through tool calls [src-051].
  • For action-taking voice agents, Voice Agent Preambles help avoid dead air by acknowledging the task and updating the user while reasoning or tool calls run [src-051].
  • OpenAI's Build Hour reframes mature voice agents as Voice-to-Action Interfaces: the agent can operate an e-commerce UI or analytics dashboard through tools, not merely answer questions aloud [src-083].
  • Sierra's production notes add the enterprise reality: real voice agents need workflows, allowed tools, grounding, brand language, guardrails, VAD behavior, traces, redaction, payment-safe flows, and full-call simulations around the model [src-083].

Related entities

Related concepts

Source references

  • [src-007] Nate Herk cluster — Nate Herk — Voice AI agents cluster (4 videos)

– Videos referenced: zWLZ3bVVwD8, BO-jFbN4p8Y, y-cq_Qo4zVo, Qt3zMBH-FNg

  • [src-051] OpenAI – "We’re introducing three audio models in the API" (2026-05-07)
  • [src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 494 indexed pages and articles.

  1. Wiki concept GPT Realtime 2 An OpenAI realtime audio model for voice agents that can follow instructions, reason, preserve conversational context, call tools, operate Related by voice
  2. Wiki concept Voice Agent Preambles Short spoken status updates that let a realtime agent acknowledge a user request, explain what it is doing, and Related by voice
  3. Insight AI Beyond POCs How enterprise AI moves beyond proofs of concept through ownership, governance, measurement, adoption, and production operating models Readers have engaged with this next