GPT Realtime 2
GPT Realtime 2 is an OpenAI realtime audio model for voice agents that can follow instructions, reason, preserve conversational context, call tools, operate application state, and update users while work is happening in the background.
Key facts
- Type: Realtime audio / voice-agent model
- Maker: OpenAI
- First seen in wiki: OpenAI's May 7, 2026 API audio-model demo [src-051]
- Core capability: OpenAI describes GPT Realtime 2 as bringing intelligent reasoning to voice agents that can follow instructions and take actions [src-051].
- Tool behavior: The demo highlights reasoning, parallel tool calling, CRM updating, calendar lookup, and connected-system actions [src-051].
- Conversation behavior: The model can stay in the conversation, listen without interrupting, preserve context, and resume only when prompted [src-051].
- Context window: OpenAI describes GPT Realtime 2 as having a 128k context window, roughly enough for an hour of audio context [src-083].
- Reasoning behavior: The Build Hour highlights preambles, prompt adherence, parallel tool calls, domain vocabulary understanding, and voice expressiveness as practical improvements over earlier realtime voice models [src-083].
- Product pattern: OpenAI demonstrates GPT Realtime 2 driving an e-commerce UI and an analytics dashboard through tools, turning voice into action rather than only conversation [src-083].
What it does
In the demo, GPT Realtime 2 powers a personal voice assistant that checks a calendar, identifies an upcoming customer meeting, waits silently while the presenters discuss the model, and then resumes when addressed [src-051].
OpenAI also demonstrates task execution: the model acknowledges that it will pull context and update the CRM, then returns a brief with customer context and next-step blockers. This positions realtime audio agents as interfaces for dashboards, SaaS tools, connected devices, and other systems that require action, not only conversation [src-051].
In the later Build Hour, GPT Realtime 2 is shown as a [[voice-to-action-interfaces|voice-to-action]] model. It can search products, inspect reviews, check weather, add items to a cart, filter analytics dashboards, kick off investigations, and decide when to speak versus when to keep working silently [src-083].
Related
- See also: OpenAI, GPT Realtime Translate, GPT Realtime Whisper, Sierra, Voice Agents, Live Voice Models, Voice Agent Preambles, Voice-to-Action Interfaces, Production Voice Agent Harness