GPT Realtime Whisper

GPT Realtime Whisper is OpenAI's streaming speech-to-text model for low-latency transcription in realtime audio applications.

Key facts

Type: Streaming speech-to-text model
Maker: OpenAI
First seen in wiki: OpenAI's Build Hour on GPT Realtime 2 ^[src-083]
Latency: OpenAI describes the model as tunable down to roughly 200ms latency for realtime captions and voice-agent input ^[src-083].
Language coverage: The session describes support for about 80 input languages ^[src-083].
Role in stack: It is positioned between classic batch transcription and full speech-to-speech models: still transcription-first, but fast enough to drive captions, meeting notes, ambient context, and earlier tool calls ^[src-083].

What it does

GPT Realtime Whisper gives developers a streaming transcription layer when they need text quickly but do not necessarily need a full voice-to-voice model. OpenAI connects it to realtime captions, meeting notes, and voice-agent systems where earlier recognition lets the application prepare tool calls or context before the speaker has finished ^[src-083].

See also: OpenAI, OpenAI Whisper, GPT Realtime 2, GPT Realtime Translate
Concepts: Live Voice Models, Voice Agents

Source references

^[src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 491 indexed pages and articles.

GPT Realtime Whisper

GPT Realtime Whisper

Key facts

What it does

Related

Source references

Robin Cartier perspective

Keep reading from this thread

Robin Cartier

Company

Services