Markdown Conversion for Token Reduction

The practice of converting source documents to plain markdown before feeding them to Claude, exploiting the tokeniser’s efficiency on clean text versus format-heavy file types.

Reduction ratios

Format	Token reduction
—	—
HTML → markdown	~90%
PDF → markdown	65–70%
DOCX → markdown	~33%

A 40-page PDF can occupy the same token space as a 130-page markdown file. ^[011]

Key points

PDFs and HTML carry layout metadata, CSS, and formatting noise the model does not need for most tasks — only the text content matters ^[011]
Recommended conversion tool: Dockling (and similar converters) for fast automated conversion ^[011]
Exception: OCR and vision tasks require the original file format ^[011]
Pairs naturally with Claude Code Memory best practices: CLAUDE.md should route to separate files rather than inlining all context ^[011]

Related concepts

Context Rot — high-format documents accelerate context rot when ingested raw
Token Economics — document format is a significant token cost lever
Context Management — pre-processing documents is a standard context hygiene step

Source references

^[011] Nate Herk — Claude Code power features cluster (2026-04-20 to 2026-04-27)

Markdown Conversion for Token Reduction

Reduction ratios

Key points

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services