Voice Agents
AI agents whose primary interface is spoken conversation over a phone call, app, or in-browser audio stream. Earlier examples combine a transcriber, LLM brain, voice synthesiser, and tool layer; newer realtime audio models such as GPT Realtime 2 collapse more of that loop into a native audio model that can reason, preserve context, and call tools.
Key points
- Two top-level shapes: inbound (receptionist answering calls) and outbound (agent placing calls)
- Front-end platform (Vapi) handles the voice stack; back-end workflow engine (n8n) handles deterministic business logic
- System prompt is the single biggest lever on quality — expect dozens of iterations in production
- Voice agents should self-identify as AI on the opening turn as a baseline ethical practice
- Wireframing the conversational flow before building is strongly recommended — conditional logic branches quickly
- OpenAI's GPT Realtime 2 demo shows a voice agent checking a calendar, staying silently in context until reactivated, and updating a CRM through tool calls [src-051].
- For action-taking voice agents, Voice Agent Preambles help avoid dead air by acknowledging the task and updating the user while reasoning or tool calls run [src-051].
- OpenAI's Build Hour reframes mature voice agents as Voice-to-Action Interfaces: the agent can operate an e-commerce UI or analytics dashboard through tools, not merely answer questions aloud [src-083].
- Sierra's production notes add the enterprise reality: real voice agents need workflows, allowed tools, grounding, brand language, guardrails, VAD behavior, traces, redaction, payment-safe flows, and full-call simulations around the model [src-083].
Related entities
Related concepts
- Inbound Voice Receptionists
- Outbound Voice Agents
- Live Voice Models
- Voice Agent Handoff
- Voice Agent Preambles
- Voice-to-Action Interfaces
- Production Voice Agent Harness
Source references
- [src-007] Nate Herk cluster — Nate Herk — Voice AI agents cluster (4 videos)
– Videos referenced: zWLZ3bVVwD8, BO-jFbN4p8Y, y-cq_Qo4zVo, Qt3zMBH-FNg
- [src-051] OpenAI – "We’re introducing three audio models in the API" (2026-05-07)
- [src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)
Recommended next
Keep reading from this thread
From 494 indexed pages and articles.
- Wiki concept GPT Realtime 2 An OpenAI realtime audio model for voice agents that can follow instructions, reason, preserve conversational context, call tools, operate Related by voice
- Wiki concept Voice Agent Preambles Short spoken status updates that let a realtime agent acknowledge a user request, explain what it is doing, and Related by voice
- Insight AI Beyond POCs How enterprise AI moves beyond proofs of concept through ownership, governance, measurement, adoption, and production operating models Readers have engaged with this next