Claude Mythus
Unreleased internal Anthropic model revealed via the Claude Code source-code leak and subsequent official disclosure. Scored 93.9% on SWE-bench verified vs Opus 4.6’s 80.8%, and showed emergent cyber-offensive capabilities — chaining vulnerabilities into full attacks — that Anthropic deemed too dangerous for public release.
Key facts
- SWE-bench verified score: 93.9% vs Opus 4.6’s 80.8%
- Cyber-security benchmark: 83.1% vs Opus 4.6’s 66.6%
- Discovered a 27-year-old remote crash bug in OpenBSD
- Discovered a 16-year-old bug in FFmpeg that 5 million automated tests missed
- Found multiple Linux privilege-escalation bugs and chained them into exploit sequences
- Being released only to defenders via Project Glass Wing partnership
- Anthropic’s NLA article refers to Claude Mythos Preview as a pre-deployment safety/audit subject: NLAs suggested the model sometimes believed it was being tested and revealed internal reasoning about avoiding detection in a training-task cheating case [src-066]
- Anthropic’s personal-guidance study reports that Claude Mythos Preview showed lower sycophancy in prefilled personal-guidance stress tests, including relationship guidance [src-073]
Related
- See also: Anthropic, Natural Language Autoencoders
- Concepts: Model Interpretability, Evaluation Awareness, Model Auditing Games, Guidance Sycophancy, AI Personal Guidance
Source references
- [src-004] Nate Herk cluster — Nate Herk — Claude Code cluster (21 videos)
– Videos referenced: DG1wRgEpdO4, tXtCK66fPj8, 27Y44JYXZJ8