OpenAI Whisper

OpenAI Whisper

Speech-to-text model family used for video transcription and realtime speech recognition. In the wiki it first appears as the transcription backend for AI video editing, and later as GPT Realtime Whisper for low-latency streaming transcription.

Key facts

  • Type: Speech-to-text model
  • Maker: OpenAI (open-source)
  • Status: Active
  • Variants: OpenAI API (hosted), whisper.cpp (local — free, but RAM-intensive)
  • Output: Transcript text + word-level timestamps in milliseconds
  • Realtime variant: GPT Realtime Whisper is described by OpenAI as a streaming transcription model with tunable latency down to roughly 200ms and about 80 input languages [src-083].

Use in pipeline

video-use and HyperFrames both support Whisper as a transcription backend. The word-level timestamps it produces are passed to HyperFrames to trigger animation elements at the exact spoken moment. [src-012]

Related

Source references

  • [src-012] Nate Herk — Video editing & content creation cluster (2026-04-15 to 2026-04-23)
  • [src-083] OpenAI – "Build Hour: GPT-Realtime-2" (2026-05-13)