AI & TechArtificial IntelligenceCybersecurityNewswireTechnology

Anthropic bans its Fable 5 model from discussing dangerous topics

Originally published on: June 9, 2026
▼ Summary

– Anthropic publicly released Claude Fable 5, its first “Mythos-class” model that surpasses previous Opus models, but with safeguards to block queries on cybersecurity, biology, and chemistry.
– Fable 5 operates on the same underlying model as Mythos 5 but, unlike Mythos 5’s restricted access, funnels sensitive queries to Claude Opus 4.8 and warns users.
– The safeguards are tuned to be “stricter than ideal,” causing false positives in under five percent of sessions to prevent malicious actors from receiving harmful assistance.
– Fable 5’s topic-based safeguards use classifiers to detect banned subjects and jailbreak attempts, with external red-team testing finding no universal jailbreaks in over 1,000 hours.
– Anthropic is particularly concerned about Mythos 5’s ability to perform “agentic hacking,” though UK testing found Mythos Preview performed similarly to OpenAI’s GPT-5.5 on cyber challenges.

Anthropic officially unveiled Claude Fable 5 on Tuesday, marking the release of its first “Mythos-class” model, which the company claims outperforms its previous frontier Opus models across the board. However, the launch comes with strict safety guardrails designed to block the model from answering questions about cybersecurity, biology, and chemistry , fields where Anthropic has expressed concern that its technology could “uplift” bad actors.

According to Anthropic, Fable 5 is built on the “same underlying model” as Mythos 5, which is emerging today from its months-long “Mythos Preview” phase. But while Mythos 5 is accessible only to a “small group of cyberdefenders” deemed trustworthy through the existing Project Glasswing program, the publicly available Fable 5 takes a different approach. When users ask about certain sensitive topics, the model redirects those queries to the older Claude Opus 4.8 and alerts the user that this redirection is happening.

Among the many benchmark improvements Anthropic claims for Fable 5, the jump in cybersecurity performance stands out as particularly significant. The company has deliberately calibrated the new safeguards to be “stricter than ideal,” meaning the system may occasionally reject “harmless requests” in ways that could frustrate regular users. Still, Anthropic notes that such false positives occur in fewer than five percent of all test sessions, and the trade-off is worth it to prevent Mythos from assisting malicious actors in “causing serious harm that they couldn’t have received from other sources.”

Fable 5’s topic-based restrictions rely on a system of classifiers designed to broadly detect prohibited prompt subjects and any attempted jailbreaks. After more than 1,000 hours of red-team testing with a bug bounty program, Anthropic reports that external teams failed to find any universal jailbreaks for Fable 5. The new model also resisted automated jailbreak attempts far more effectively than earlier Claude Opus models.

Anthropic has expressed particular concern about Mythos 5’s ability to perform agentic hacking , executing multi-step cyberattacks with much greater ease than previous models. Yet testing from the UK’s AI Security Institute in recent months found that Mythos Preview performed similarly to OpenAI’s GPT-5.5 on a series of Capture the Flag challenges, suggesting that Mythos’ capabilities do not represent “a breakthrough specific to one model.”

(Source: Ars Technica)

Topics

claude fable 5 98% ai safety safeguards 95% mythos 5 model 92% cybersecurity risks 90% benchmark improvements 85% classifier systems 82% jailbreak resistance 80% project glasswing 78% agentic hacking 76% red-team testing 74%