Claude Opus 4.7

Anthropic’s most capable publicly available model as of April 2026. Powers Claude Design and the /ultra-plan research preview in Claude Code. Released alongside the Claude Code desktop app and the /ultrareview command.

Key facts

Type: Foundation language model
Maker: Anthropic
Released: April 2026
Status: Active
Pricing: $5/M input tokens, $25/M output tokens ^[src-012]
Context window: 1M tokens (vs GPT 5.5’s 400K in Codex) ^[src-012]
Visual benchmarks: 82–91% (vs 69–84.7% for Opus 4.6) ^[src-009]
New tokenizer: 1.0–1.3x token cost increase vs Opus 4.6 ^[src-010]

What changed from 4.6

Visual reasoning: 12–21 percentage points higher on visual benchmarks — powers Claude Design ^[src-009]
X high effort level: New reasoning ceiling exclusive to Opus 4.7 ^[src-010]
Adaptive thinking: Replaces the binary extended-thinking toggle — model allocates reasoning per-turn based on task complexity ^[src-010]
New tokenizer: Same prompts cost 1.0–1.3x more tokens ^[src-010]
System card: 232-page safety evaluation under Project Glass Wing ^[src-010]
Personal-guidance behavior: trained/evaluated with relationship-guidance scenarios to reduce sycophancy versus Opus 4.6 ^[src-073]

The Opus 4.6 degradation backstory

Anthropic silently changed the effort default from “high” to “medium” on 2026-02-09 via adaptive thinking parameters — not a model weight change. This caused a 73% collapse in reasoning depth (2,200 → 600 chars) and 33.7% of sessions skipping file reads. Opus 4.7 is Anthropic’s stated correction — “the cure to their own disease.” Boris Cherny (Claude Code creator) confirmed: turns with zero reasoning had the highest hallucination rates. ^[src-010]

Benchmark comparison vs GPT 5.5 (Nate’s 4-task coding benchmark)

Metric	Opus 4.7	GPT 5.5
Total runtime (4 tasks)	40m 43s	20m 49s
Output tokens	~250K	~70K
SWE-bench Verified	higher	lower
Terminal Bench 2.0	69.4	82.7

Opus 4.7 retains the SWE-bench Verified lead (real GitHub issue resolution). GPT 5.5 is ~2x faster and uses ~3.5x fewer output tokens. ^[src-012]

Personal guidance behavior

Anthropic’s personal-guidance study reports that Opus 4.7 reduced sycophancy in relationship-guidance stress tests compared with Opus 4.6, with improvements generalizing across personal-guidance domains ^[src-073].

The training intervention used synthetic relationship-guidance scenarios derived from real pushback patterns where prior Claude versions became overly validating. Opus 4.7 was then tested by prefilling real sycophantic conversations and measuring whether the model could change direction ^[src-073].

See also: Anthropic, Claude Design, Claude Code, GPT 5.5
Concepts: Claude Code Token Economics, Adaptive Thinking, Model Effort Levels, /ultrareview Command, Guidance Sycophancy, AI Personal Guidance

Source references

^[src-009] Nate Herk — Claude Design cluster (2026-04-17 to 2026-04-21)
^[src-010] Nate Herk — Cloud agents & model releases cluster (2026-04-14 to 2026-04-17)
^[src-012] Nate Herk — Video editing & content creation cluster (2026-04-15 to 2026-04-23)
^[src-073] Anthropic – “How people ask Claude for personal guidance” (2026-04-30)

Claude Opus 4.7

Claude Opus 4.7

Key facts

What changed from 4.6

The Opus 4.6 degradation backstory

Benchmark comparison vs GPT 5.5 (Nate’s 4-task coding benchmark)

Personal guidance behavior

Related

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services