AI & TechArtificial IntelligenceCybersecurityNewswireTechnology

Anthropic’s Mythos Breach: A Costly Lesson in Humiliation

▼ Summary

– Anthropic’s highly capable cybersecurity AI model, Claude Mythos, was accessed by unauthorized users through an educated guess about its online location, using data from a prior breach of contractor Mercor.
– The breach exploited a standard, predictable hacking technique that security experts say Anthropic should have anticipated and prepared for, given its emphasis on safety and known prior compromises.
– Anthropic failed to detect the unauthorized access promptly, as it was uncovered by a reporter, raising questions about its monitoring capabilities despite its ability to log and track model use.
– The incident is embarrassing for Anthropic because it built its brand on rigorous AI safety, yet suffered a basic security failure, and its hype of Mythos as a dangerous tool made it a prime target.
– The unauthorized group did not use Mythos for malicious cybersecurity tasks, which Bloomberg frames as a lucky break given the model’s claimed ability to find vulnerabilities in major systems.

Anthropic’s carefully managed unveiling of Claude Mythos has taken a deeply embarrassing turn. After weeks of insisting the AI model is so advanced at cybersecurity that it poses a grave danger if released publicly, the system appears to have fallen into the wrong hands anyway.

According to Bloomberg, a “small group of unauthorized users” has been accessing Mythos since the very day Anthropic announced it would be offered to a select group of companies for testing. The model’s existence was first revealed through a leak. Now, Anthropic says it is investigating. It is a remarkably poor look for a company that has staked its reputation on AI safety while boasting about the cybersecurity prowess of its latest creation.

From a technical perspective, the Mythos breach is almost laughably simple. Bloomberg reports that the group gained access by making “an educated guess about the model’s online location.” They pieced together clues from a breach at Mercor, a company that supplies AI training data, combined with insider knowledge from a member who had done contract work evaluating Anthropic models. This was not a sophisticated exploit or a wholesale theft of the model. It was a lucky guess paired with leaked credentials.

Security vulnerabilities are inevitable, and it was Mercor, not Anthropic, that exposed the information used to guess Mythos’ location. Pia Hüsch, a research fellow at the British think tank Royal United Services Institute (RUSI), told me that no company is ever completely secure and that humans are often the weakest link. She added that it “does initially seem a bit lucky” there were no serious consequences.

But this isn’t just bad luck. Educated guesses are a standard hacking technique, and the Mercor breach was already public knowledge before Mythos was released. Security researcher Lukasz Olejnik described the incident to me as an “entirely imaginable” failure, one the cybersecurity industry has been dealing with for the last 20 years. Anthropic should have anticipated this and prepared accordingly, especially knowing its own information had been compromised.

Anthropic also appears to have had the tools to spot the breach. The company can “log and track model use,” Olejnik said, which should make it possible to stop unauthorized access, particularly since the Mythos rollout was supposed to be highly restricted. Evidently, Anthropic was not monitoring closely enough. Given how dangerous the company claims the model is, it is reasonable to ask why.

By Bloomberg’s account, the group was not using Mythos for cybersecurity tasks. They mostly wanted to mess around with the new model, and doing anything more aggressive could have tipped Anthropic off. If Anthropic’s messaging is to be believed, that is a lucky break. The company has framed Mythos as a “watershed moment for security,” claiming it found vulnerabilities in “every major operating system and web browser,” and said its release must be coordinated to allow time to “reinforce the world’s cyber defenses.”

Anthropic has a habit of using dramatic, alarming language that can be difficult to interrogate cleanly, including flirting with the idea that its Claude model might be conscious. Even so, early reports from parties with access suggest Mythos is genuinely adept at cybersecurity. Mozilla CTO Bobby Holley said it found hundreds of bugs in Firefox 150 and may finally give defenders a chance at complete victory over attackers. Unsurprisingly, governments and financial institutions around the world have been eager to get their hands on it. The NSA and other US agencies reportedly have access despite Anthropic’s designation as a supply chain risk, though the rollout appears to have bypassed the CISA so far.

The fact that the breach was uncovered by a reporter rather than Anthropic raises an obvious question: is this an isolated incident? It “really illustrates how wide the circle of people who may be able to do this is, even if they don’t have super technically sophisticated means,” Hüsch said. Anthropic will likely comb through its supply chain to plug gaps, but there is a wide range of actors who would want access to a model like this, some with a great deal of money behind them. There is no reason to assume anyone else who gained access would be as restrained as the group Bloomberg reported on.

Anthropic has, to some extent, shot itself in the foot. The company has built its identity around taking AI safety more seriously than its rivals, creating sky-high expectations for model security that jar with its apparent carelessness. The fact that Mythos was exposed through such a basic and predictable failure only underscores that. Worse still, by hyping Mythos as an unusually powerful tool too dangerous for public release, Anthropic turned it into an obvious target, whether for malicious actors or hackers simply looking for a challenge.

This is not even the first awkward security incident around Mythos. The model’s existence was accidentally revealed before release through an “unsecured data trove” on a central system containing content for its website. Now, that model has been secretly accessed via a wholly predictable vulnerability Anthropic did not think to patch. Perfection is impossible, but for a company that has anointed itself the vanguard of AI safety, such a basic misstep is hard to justify, even with some of the bad luck it has had.

To Hüsch, the whole episode can be summed up in one word: humiliation. “Anthropic claims to be at the absolute forefront of all these technologies, but also positions itself as the responsible actor in all of this,” she said. “The fact that this has now been accessed through unauthorized means so quickly, and through such an unsophisticated attempt, is really a humiliation for them.”

(Source: The Verge)

Topics

ai safety 95% cybersecurity breach 93% anthropic reputation 91% insider threats 88% predictable vulnerabilities 86% mercor breach 84% model hype 82% supply chain risk 80% monitoring failure 78% government access 76%