Markdown Conversion for Token Reduction

The practice of converting source documents to plain markdown before feeding them to Claude, exploiting the tokeniser’s efficiency on clean text versus format-heavy file types.

Reduction ratios

Format Token reduction
HTML → markdown ~90%
PDF → markdown 65–70%
DOCX → markdown ~33%

A 40-page PDF can occupy the same token space as a 130-page markdown file. [011]

Key points

  • PDFs and HTML carry layout metadata, CSS, and formatting noise the model does not need for most tasks — only the text content matters [011]
  • Recommended conversion tool: Dockling (and similar converters) for fast automated conversion [011]
  • Exception: OCR and vision tasks require the original file format [011]
  • Pairs naturally with Claude Code Memory best practices: CLAUDE.md should route to separate files rather than inlining all context [011]

Related concepts

  • Context Rot — high-format documents accelerate context rot when ingested raw
  • Token Economics — document format is a significant token cost lever
  • Context Management — pre-processing documents is a standard context hygiene step

Source references

  • [011] Nate Herk — Claude Code power features cluster (2026-04-20 to 2026-04-27)