AI Video Editing Pipeline

End-to-end agentic pipeline for producing edited, motion-graphic-enhanced video from a raw recording using Claude Code as orchestrator.

Standard stages

  • Raw recording (webcam, screen capture) or HeyGen avatar video input
  • video-use — transcript generation (Whisper or ElevenLabs API) + filler word / silence / retake removal → edited MP4 + word-level timestamp JSON
  • HyperFrames or Remotion — HTML/CSS motion graphics animated to word-level timestamps (animated subtitles, info cards, lower thirds, intro sequences)
  • FFmpeg — final render to MP4
  • Claude Code orchestrates all steps — writing the HyperFrames HTML compositions, calling video-use, invoking FFmpeg, and iterating on design feedback. [012]

    Key points

    • Each iteration improves Claude Code’s skills and design philosophy docs for that video type, converging toward drop-in-and-render automation over time [012]
    • Token consumption: ~125K–260K tokens per video editing session [012]
    • Claude Design (web app) can also produce animated video by exporting a site as standalone HTML, then instructing Claude Code to render to MP4 — but cannot natively transcribe audio [012]
    • Word-level timestamps are the critical data dependency: without them, motion graphics fire at wrong moments and feel mechanical [012]

    Related entities

    Related concepts

    Source references

    • [012] Nate Herk — Video editing & content creation cluster (2026-04-15 to 2026-04-23)