AI & TechArtificial IntelligenceBusinessNewswireTechnology

Anthropic’s Claude Opus 4.8 boosts honesty 4x, Mythos next

Originally published on: May 29, 2026
▼ Summary

– Anthropic released Claude Opus 4.8, an upgraded AI model that is four times less likely to let code flaws pass unremarked and shows improved honesty and reliability.
– Opus 4.8 shows benchmark gains, including a rise from 64.3% to 69.2% on agentic coding, and has lower rates of misaligned behavior than its predecessor.
– Early testers, including Cognition, Cursor, Harvey, and Databricks, reported practical improvements in tool use, code quality, legal benchmarks, and cost efficiency.
– Anthropic teased Mythos-class models, which have already found over 10,000 critical software vulnerabilities through Project Glasswing, and will require stronger cyber safeguards before general release.
– Anthropic announced a $65 billion Series H round at a $965 billion post-money valuation and continues to expand internationally, including new offices in Milan and Seoul.

Anthropic has officially launched Claude Opus 4.8, a significant upgrade to its flagship artificial intelligence model that the company claims is far more honest, more dependable in agentic tasks, and markedly better at identifying its own errors. The model is available now at the same pricing as its predecessor , $5 per million input tokens and $25 per million output tokens , and is being rolled out across all Anthropic products, including claude.ai, Claude Code, and the API.

The standout feature is a major boost in honesty and self-correction. According to Anthropic, Opus 4.8 is roughly four times less likely than Opus 4.7 to let flaws in its own code pass without comment. Early testers have observed that the model is more willing to flag uncertainties about its work and significantly less prone to making unsupported claims, a persistent challenge across AI systems that often project confidence even when it is unwarranted.

Benchmark improvements are broad and consistent. On agentic coding, measured by Terminal-Bench 2.1, scores jumped from 64.3% to 69.2%. Multidisciplinary reasoning with tools improved from 54.7% to 57.9%. Agentic computer use inched up from 82.8% to 83.4%, and knowledge work scores rose from 1,753 to 1,890. Anthropic’s internal alignment assessments also show that Opus 4.8 achieves new highs in prosocial traits, including supporting user autonomy and acting in the user’s best interest. Rates of misaligned behavior, such as deception or cooperation with misuse, are substantially lower than in Opus 4.7 and comparable to Claude Mythos Preview, the company’s best-aligned model to date.

Early adopters are reporting real-world gains. Cognition, the company behind the AI coding agent Devin, noted that Opus 4.8 uses tools cleanly and resolves comment-verbosity and tool-calling issues that plagued Opus 4.7. Cursor, the AI-powered code editor, saw improvements across every effort level on its CursorBench evaluation. Harvey, which builds AI for legal work, reported that Opus 4.8 achieved the highest score ever recorded on its Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard. Databricks found that Opus 4.8 handles deeper multistep questions faster in its Genie AI agent, at a 61% cheaper token cost than Opus 4.7. Thomson Reuters noted meaningful improvements in consistency and reasoning quality for its CoCounsel Legal product. Hebbia, which focuses on financial document analysis, reported better citation precision and greater token efficiency on retrieval tasks.

Anthropic is also introducing several new features alongside the model. A new effort control in claude.ai and Cowork lets users choose how much computation Claude applies to a response, allowing them to trade speed for quality. Claude Code gains a dynamic workflows feature that enables it to plan work and run hundreds of parallel subagents in a single session, making codebase-scale migrations across hundreds of thousands of lines of code feasible. For developers, the Messages API now accepts system entries inside the messages array, so instructions can be updated mid-task without breaking the prompt cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than it was for previous models.

The bigger news may be what comes next. Anthropic has announced plans to release a new class of model with higher intelligence than Opus, built on the Claude Mythos architecture. A small group of organizations is already using Claude Mythos Preview through Project Glasswing, a cybersecurity initiative. Anthropic and roughly 50 partners , including Apple, Google, Microsoft, and Amazon Web Services , have used Mythos Preview to discover more than 10,000 high- or critical-severity vulnerabilities across critical software infrastructure. Mythos-class models require stronger cyber safeguards before general release, Anthropic said, but the company expects to make them available to all customers in the coming weeks. The model sits a full capability tier above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the excitement and the caution surrounding its deployment.

Anthropic’s valuation continues to climb. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day as the Opus 4.8 launch, up from the $380 billion valuation at which it closed its $30 billion Series G in February. Revenue has grown from roughly $1 billion at the end of 2024 to an estimated $30 billion annualized run rate in 2026, driven by enterprise adoption of Claude. Anthropic also opened a new office in Milan on May 28, its sixth in Europe, and appointed KiYoung Choi as Representative Director of Korea ahead of a Seoul office opening, reflecting growing demand for Claude in enterprise markets outside the United States.

The competitive landscape is intense. Opus 4.8 enters a market where the pace of model releases has accelerated sharply. OpenAI launched GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records on professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI, and Google, with each company releasing incremental model upgrades at an increasing pace.

For Anthropic, the distinction it is trying to draw with Opus 4.8 is not raw capability but reliability and honesty. A model that catches its own mistakes, flags its uncertainties, and follows instructions consistently is more useful in agentic workflows where AI systems operate with limited human oversight. Whether that positioning holds as Mythos-class models arrive, promising higher intelligence with new safety constraints, will determine whether Anthropic can maintain its lead in the enterprise market it has worked to dominate.

(Source: The Next Web)

Topics

claude opus 4.8 98% ai honesty 93% mythos-class models 92% agentic coding 90% cybersecurity vulnerabilities 88% benchmark performance 87% anthropic valuation 86% enterprise ai adoption 85% project glasswing 84% ai safety alignment 83%