GPT Realtime 2

GPT Realtime 2 is an OpenAI realtime audio model for voice agents that can follow instructions, reason, preserve conversational context, call tools, operate application state, and update users while work is happening in the background.

Key facts

Type: Realtime audio / voice-agent model
Maker: OpenAI
First seen in wiki: OpenAI's May 7, 2026 API audio-model demo ^[src-051]
Core capability: OpenAI describes GPT Realtime 2 as bringing intelligent reasoning to voice agents that can follow instructions and take actions ^[src-051].
Tool behavior: The demo highlights reasoning, parallel tool calling, CRM updating, calendar lookup, and connected-system actions ^[src-051].
Conversation behavior: The model can stay in the conversation, listen without interrupting, preserve context, and resume only when prompted ^[src-051].
Context window: OpenAI describes GPT Realtime 2 as having a 128k context window, roughly enough for an hour of audio context ^[src-083].
Reasoning behavior: The Build Hour highlights preambles, prompt adherence, parallel tool calls, domain vocabulary understanding, and voice expressiveness as practical improvements over earlier realtime voice models ^[src-083].
Product pattern: OpenAI demonstrates GPT Realtime 2 driving an e-commerce UI and an analytics dashboard through tools, turning voice into action rather than only conversation ^[src-083].
Desktop-control pattern: Pat Simmons demonstrates GPT Realtime 2 as the voice layer for a local desktop agent that uses push-to-talk capture, browser tools, Obsidian MCP access, and accessibility-tree control for complex applications ^[src-104].

What it does

In the demo, GPT Realtime 2 powers a personal voice assistant that checks a calendar, identifies an upcoming customer meeting, waits silently while the presenters discuss the model, and then resumes when addressed ^[src-051].

OpenAI also demonstrates task execution: the model acknowledges that it will pull context and update the CRM, then returns a brief with customer context and next-step blockers. This positions realtime audio agents as interfaces for dashboards, SaaS tools, connected devices, and other systems that require action, not only conversation ^[src-051].

In the later Build Hour, GPT Realtime 2 is shown as a [[voice-to-action-interfaces|voice-to-action]] model. It can search products, inspect reviews, check weather, add items to a cart, filter analytics dashboards, kick off investigations, and decide when to speak versus when to keep working silently ^[src-083].

Simmons's walkthrough moves the pattern from hosted demos into local computer use. GPT Realtime 2 listens to a push-to-talk command, calls small tools, and delegates application control to browser automation, MCP/API integrations, or Agent Desktop-style accessibility-tree actions depending on the target app ^[src-104].

See also: OpenAI, GPT Realtime Translate, GPT Realtime Whisper, Sierra, Voice Agents, Live Voice Models, Voice Agent Preambles, Voice-to-Action Interfaces, Production Voice Agent Harness, Voice-Driven Desktop Agents, Agent Desktop

Source references

^[src-051] OpenAI – "We’re introducing three audio models in the API" (2026-05-07)
^[src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)
^[src-104] Pat Simmons – "GPT Realtime 2 Can Now Run Your Entire Computer (Just Your Voice)" (2026-06-17)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 491 indexed pages and articles.

GPT Realtime 2

GPT Realtime 2

Key facts

What it does

Related

Source references

Robin Cartier perspective

Keep reading from this thread

Robin Cartier

Company

Services