Claude Opus 4.7
Anthropic’s most capable publicly available model as of April 2026. Powers Claude Design and the /ultra-plan research preview in Claude Code. Released alongside the Claude Code desktop app and the /ultrareview command.
Key facts
- Type: Foundation language model
- Maker: Anthropic
- Released: April 2026
- Status: Active
- Pricing: $5/M input tokens, $25/M output tokens [src-012]
- Context window: 1M tokens (vs GPT 5.5’s 400K in Codex) [src-012]
- Visual benchmarks: 82–91% (vs 69–84.7% for Opus 4.6) [src-009]
- New tokenizer: 1.0–1.3x token cost increase vs Opus 4.6 [src-010]
What changed from 4.6
- Visual reasoning: 12–21 percentage points higher on visual benchmarks — powers Claude Design [src-009]
- X high effort level: New reasoning ceiling exclusive to Opus 4.7 [src-010]
- Adaptive thinking: Replaces the binary extended-thinking toggle — model allocates reasoning per-turn based on task complexity [src-010]
- New tokenizer: Same prompts cost 1.0–1.3x more tokens [src-010]
- System card: 232-page safety evaluation under Project Glass Wing [src-010]
- Personal-guidance behavior: trained/evaluated with relationship-guidance scenarios to reduce sycophancy versus Opus 4.6 [src-073]
The Opus 4.6 degradation backstory
Anthropic silently changed the effort default from “high” to “medium” on 2026-02-09 via adaptive thinking parameters — not a model weight change. This caused a 73% collapse in reasoning depth (2,200 → 600 chars) and 33.7% of sessions skipping file reads. Opus 4.7 is Anthropic’s stated correction — “the cure to their own disease.” Boris Cherny (Claude Code creator) confirmed: turns with zero reasoning had the highest hallucination rates. [src-010]
Benchmark comparison vs GPT 5.5 (Nate’s 4-task coding benchmark)
| Metric | Opus 4.7 | GPT 5.5 |
|---|---|---|
| Total runtime (4 tasks) | 40m 43s | 20m 49s |
| Output tokens | ~250K | ~70K |
| SWE-bench Verified | higher | lower |
| Terminal Bench 2.0 | 69.4 | 82.7 |
Opus 4.7 retains the SWE-bench Verified lead (real GitHub issue resolution). GPT 5.5 is ~2x faster and uses ~3.5x fewer output tokens. [src-012]
Personal guidance behavior
Anthropic’s personal-guidance study reports that Opus 4.7 reduced sycophancy in relationship-guidance stress tests compared with Opus 4.6, with improvements generalizing across personal-guidance domains [src-073].
The training intervention used synthetic relationship-guidance scenarios derived from real pushback patterns where prior Claude versions became overly validating. Opus 4.7 was then tested by prefilling real sycophantic conversations and measuring whether the model could change direction [src-073].
Related
- See also: Anthropic, Claude Design, Claude Code, GPT 5.5
- Concepts: Claude Code Token Economics, Adaptive Thinking, Model Effort Levels, /ultrareview Command, Guidance Sycophancy, AI Personal Guidance
Source references
- [src-009] Nate Herk — Claude Design cluster (2026-04-17 to 2026-04-21)
- [src-010] Nate Herk — Cloud agents & model releases cluster (2026-04-14 to 2026-04-17)
- [src-012] Nate Herk — Video editing & content creation cluster (2026-04-15 to 2026-04-23)
- [src-073] Anthropic – “How people ask Claude for personal guidance” (2026-04-30)