Claude Mythos Just Scored 93.9% on SWE-Bench. Anthropic Won't Let Anyone Use It.

AITech

Karan Gosrani

Team Converzoy|April 2026

Claude Mythos Just Scored 93.9% on SWE-Bench. Anthropic Won't Let Anyone Use It.

Anthropic just built what might be the most capable AI model in the world. Then they decided not to release it.

Claude Mythos Preview, announced April 7, 2026, scored 93.9% on SWE-bench Verified, 77.8% on SWE-bench Pro, and 97.6% on USAMO 2026. Every one of those numbers represents a double-digit lead over both Claude Opus 4.6 and GPT-5.4. On BenchLM's provisional leaderboard, Mythos ranks #1 out of 106 models with a score of 99 out of 100.

And Anthropic's response to building this thing? "We do not plan to make Claude Mythos Preview generally available."

That's a fascinating decision. And what drove it tells you a lot about where AI is heading in the second half of 2026.

The Cybersecurity Problem (Or Breakthrough, Depending on How You Look at It)

Mythos isn't just good at coding benchmarks. It's terrifyingly good at finding security vulnerabilities. During internal testing, the model autonomously discovered and exploited zero-day vulnerabilities in every major operating system and every major web browser. Not theoretical weaknesses. Actual exploitable flaws that nobody knew about.

Some of what it found is wild. A 27-year-old flaw in OpenBSD that allows remote system crashes. A 16-year-old FFmpeg vulnerability that automated security tools had missed despite hitting the affected code 5 million times during testing. Linux kernel vulnerabilities that could be chained together for privilege escalation.

Thousands of high-severity vulnerabilities, found by an AI model running autonomously. That's the kind of capability that's simultaneously incredibly valuable and genuinely dangerous. If Mythos can find these flaws, so could a less safety-conscious model built by someone with worse intentions.

Project Glasswing: Anthropic's Answer

Rather than releasing Mythos publicly and hoping for the best, Anthropic launched Project Glasswing, a controlled initiative to use the model's security capabilities defensively. Twelve launch partners got access: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. About 40 organizations total are in the program, all of them maintaining critical software infrastructure.

Anthropic committed $100 million in Mythos usage credits to the initiative, plus $2.5 million to the Linux Foundation's Alpha-Omega and OpenSSF programs, and $1.5 million to the Apache Software Foundation. Within 90 days, they'll publish a public report on vulnerabilities fixed and improvements made.

It's worth noting that several of these partners overlap with Anthropic's broader infrastructure push. Broadcom, Google, and NVIDIA are all involved, and we recently covered how Anthropic's chip deal with Broadcom and Google is reshaping AI infrastructure. Glasswing adds a security dimension to those relationships.

The pricing for authorized users is steep: $25 per million input tokens and $125 per million output tokens. That's significantly more expensive than current Claude models, which makes sense given the capabilities and the restricted access.

What the Benchmarks Actually Mean

Let's put those numbers in context. SWE-bench Verified tests whether an AI model can solve real software engineering problems from open-source GitHub repositories. Mythos scoring 93.9% means it can solve almost every real-world coding task thrown at it. For comparison, Opus 4.6 scores 80.8%, which was already considered state-of-the-art.

Terminal-Bench 2.0, which tests autonomous computer use and command-line problem-solving, shows a similar gap: 82% for Mythos versus 65.4% for Opus 4.6. And on CyberGym, a benchmark specifically for vulnerability reproduction, Mythos hit 83.1% compared to Opus 4.6's 66.6%.

What this means practically: Mythos can write code, find bugs, fix issues, and navigate complex systems at a level that surpasses all but the most skilled human engineers. It's not a marginal improvement. It's a step change, exactly as Anthropic described it.

Why This Matters for Businesses Using AI

You might be thinking "cool model, but I can't use it, so why should I care?" Fair question. A few reasons:

First, the capabilities previewed in Mythos will eventually trickle down. Anthropic has said that safeguards developed during the Glasswing program will be built into upcoming Claude Opus models before broader deployment. The security-hardened, more capable models coming later this year will benefit directly from what Anthropic learns during this controlled rollout.

Second, the security implications are relevant to anyone running a business online. If an AI model can find thousands of zero-day vulnerabilities in major software, those same vulnerabilities exist in the tools you're using right now. The Glasswing program is actively patching them, which makes the entire software ecosystem safer.

Third, it signals where AI capability is heading. If Mythos can autonomously discover and exploit complex security vulnerabilities, the next generation of AI tools for customer support, sales, and operations will be substantially more capable than what's available today. The AI chatbots handling your customer conversations in late 2026 will be meaningfully smarter than the ones running right now.

For businesses already using AI for customer-facing work, this trajectory is worth paying attention to. We've covered what separates good AI chatbot implementations from bad ones, and the core principle stays the same regardless of how capable the underlying model gets: the implementation matters as much as the technology.

The Controlled Release Precedent

Anthropic's decision to restrict Mythos access sets an interesting precedent. Most AI companies race to release their most powerful models as fast as possible. OpenAI, Google, and Meta have all treated broad availability as a competitive advantage. Anthropic is making the opposite bet: that demonstrating responsible deployment of dangerous capabilities builds more long-term trust (and business) than winning the benchmark leaderboard.

Whether you agree with that approach or not, the transparency is refreshing. Publishing benchmark scores, naming partner organizations, committing to a 90-day public disclosure timeline, and explaining exactly why the model isn't being released publicly is more openness than we usually get from frontier AI labs.

What Comes Next

The practical question is when Mythos-class capabilities become available in models you can actually use. Anthropic's language suggests the next Claude Opus release will incorporate learnings from Glasswing, with safety guardrails designed to prevent misuse of the model's security capabilities while preserving its coding and reasoning improvements.

For anyone building with AI right now, the takeaway is simple: the models are getting dramatically better, fast. The gap between Opus 4.6 and Mythos is the kind of jump that changes what's possible. Customer support bots that can genuinely reason through complex issues. AI agents that can autonomously handle multi-step workflows. Code assistants that catch security vulnerabilities before they ship.

If you haven't started exploring what AI can do for your business, the window where you're "early" is closing quickly. Give Converzoy a try and see what today's AI can already handle for your customer conversations. Tomorrow's models will only make it better.

Google's AI Search Now Quotes Reddit and Forums. The SEO Playbook Just Changed.

May 2026

PayPal Is Cutting 4,500 Jobs and Betting Its Future on AI. The Legacy Tech Era Is Over.

May 2026

AI Just Out-Diagnosed Two ER Doctors in a Harvard Study. Why This Matters Beyond Healthcare.

May 2026

Meta's Business AI Just 10x'd to 10 Million Weekly Conversations. The Chatbot Market Is Bigger Than You Think.

May 2026

Claude Mythos Just Scored 93.9% on SWE-Bench. Anthropic Won't Let Anyone Use It.

The Cybersecurity Problem (Or Breakthrough, Depending on How You Look at It)

Project Glasswing: Anthropic's Answer

What the Benchmarks Actually Mean

Why This Matters for Businesses Using AI

The Controlled Release Precedent

What Comes Next

You might also like

Ready to convert more visitors?