GPT Realtime Whisper

GPT Realtime Whisper is OpenAI's streaming speech-to-text model for low-latency transcription in realtime audio applications.

Key facts

Type: Streaming speech-to-text model
Maker: OpenAI
First seen in wiki: OpenAI's Build Hour on GPT Realtime 2 ^[src-083]
Latency: OpenAI describes the model as tunable down to roughly 200ms latency for realtime captions and voice-agent input ^[src-083].
Language coverage: The session describes support for about 80 input languages ^[src-083].
Role in stack: It is positioned between classic batch transcription and full speech-to-speech models: still transcription-first, but fast enough to drive captions, meeting notes, ambient context, and earlier tool calls ^[src-083].

What it does

GPT Realtime Whisper gives developers a streaming transcription layer when they need text quickly but do not necessarily need a full voice-to-voice model. OpenAI connects it to realtime captions, meeting notes, and voice-agent systems where earlier recognition lets the application prepare tool calls or context before the speaker has finished ^[src-083].

See also: OpenAI, OpenAI Whisper, GPT Realtime 2, GPT Realtime Translate
Concepts: Live Voice Models, Voice Agents

Source references

^[src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)

GPT Realtime Whisper

GPT Realtime Whisper

Key facts

What it does

Related

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services