GPT Realtime Translate

GPT Realtime Translate

GPT Realtime Translate is an OpenAI realtime audio model for live speech translation in the OpenAI API, demonstrated translating spoken French into English while preserving sentence shape and handling mid-sentence language switching.

Key facts

  • Type: Realtime audio translation model
  • Maker: OpenAI
  • First seen in wiki: OpenAI's May 7, 2026 YouTube demo of new audio models in the API [src-051]
  • Language coverage: The demo states that the model can translate across 70 languages in real time [src-051].
  • Input/output coverage: The later Build Hour describes more than 70 input languages and 13 output languages [src-083].
  • Demo behavior: It listens while the speaker is still talking, waits for key sentence information such as the verb, and begins translating without waiting for the whole utterance to finish [src-051].
  • Positioning: OpenAI frames the model as useful for media platforms, customer support, education, and other language-barrier use cases [src-051].
  • Voice behavior: OpenAI also positions the model around preserving voice characteristics, tone, and multiple-speaker dynamics in realtime translation scenarios [src-083].

What it does

The demo frames GPT Realtime Translate as a native voice interface for simultaneous or near-simultaneous translation. OpenAI shows the speaker talking in French while the model outputs English audio from the laptop, with no audio edit in the demo flow [src-051].

The model is presented as robust to multilingual speech and technical terms. The transcript says the demo can include German interruptions and terms such as GPT Realtime, OpenAI, or computer use without breaking the translation [src-051].

Related

Source references

  • [src-051] OpenAI – "We’re introducing three audio models in the API" (2026-05-07)
  • [src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)