Voice Agents

AI agents whose primary interface is spoken conversation over a phone call, app, or in-browser audio stream. Earlier examples combine a transcriber, LLM brain, voice synthesiser, and tool layer; newer realtime audio models such as GPT Realtime 2 collapse more of that loop into a native audio model that can reason, preserve context, and call tools.

Key points

Two top-level shapes: inbound (receptionist answering calls) and outbound (agent placing calls)
Front-end platform (Vapi) handles the voice stack; back-end workflow engine (n8n) handles deterministic business logic
System prompt is the single biggest lever on quality — expect dozens of iterations in production
Voice agents should self-identify as AI on the opening turn as a baseline ethical practice
Wireframing the conversational flow before building is strongly recommended — conditional logic branches quickly
OpenAI's GPT Realtime 2 demo shows a voice agent checking a calendar, staying silently in context until reactivated, and updating a CRM through tool calls ^[src-051].
For action-taking voice agents, Voice Agent Preambles help avoid dead air by acknowledging the task and updating the user while reasoning or tool calls run ^[src-051].
OpenAI's Build Hour reframes mature voice agents as Voice-to-Action Interfaces: the agent can operate an e-commerce UI or analytics dashboard through tools, not merely answer questions aloud ^[src-083].
Sierra's production notes add the enterprise reality: real voice agents need workflows, allowed tools, grounding, brand language, guardrails, VAD behavior, traces, redaction, payment-safe flows, and full-call simulations around the model ^[src-083].

Related entities

Related concepts

Source references

^[src-007] Nate Herk cluster — Nate Herk — Voice AI agents cluster (4 videos)

– Videos referenced: zWLZ3bVVwD8, BO-jFbN4p8Y, y-cq_Qo4zVo, Qt3zMBH-FNg

^[src-051] OpenAI – "We’re introducing three audio models in the API" (2026-05-07)
^[src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 494 indexed pages and articles.

Voice Agents

Voice Agents

Key points

Related entities

Related concepts

Source references

Robin Cartier perspective

Keep reading from this thread

Robin Cartier

Company

Services