AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnology

Opus 4.8 Misalignment Rates Match Claude Mythos Preview

▼ Summary

– Anthropic’s Claude Opus 4.8 offers faster thinking modes at one-third the cost of its predecessor and emphasizes honesty and safety, with low misalignment rates.
– OpenAI’s GPT-5.5 Instant produces 52.5% fewer hallucinated claims than its predecessor, reducing misinformation risk for everyday users like those asking health questions.
– Nvidia’s Nemotron 3 Nano Omni unifies multimodal input (visual, audio, text) into a single system, streamlining agent workflows and reducing token use.
– OpenAI’s GPT-5.5 earned a ZDNET Expert Score of 93/100, showing improvements in agentic coding and factual accuracy, with rapid release cycles accelerating development.
– Anthropic’s Claude Mythos, considered too powerful for public release due to cybersecurity risks, led to Project Glasswing, a collaborative effort with rivals to secure critical software.

AI labs are shipping new models at an unprecedented pace, but not every release represents a true leap forward, even when the marketing suggests otherwise. The real picture emerges when you compare models side by side: where do competitors fall short, and which models offer genuine breakthroughs versus those simply catching up to industry standards? Our Model Release Tracker cuts through the noise, helping you evaluate where each model stands and whether it merits your attention. While we don’t test every update on this list, we provide essential details and, where applicable, hands-on expert evaluations with an Expert Score. For a deeper understanding of our methodology, see our AI testing breakdown. Below, we highlight the most significant model launches of 2026 so far, updating this list as notable new models arrive.

Claude Opus 4.8 (Anthropic, May 28, 2026) replaces Opus 4.7 at the same price point. Anthropic claims it offers faster thinking modes for one-third the cost of its predecessor. True to form, this model prioritizes coding abilities, outperforming Opus 4.7 on two coding benchmarks, though it does not fully surpass OpenAI’s GPT 5.5. The company also notes it “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest,” though these definitions remain imprecise. This release underscores Anthropic’s deepening commitment to model safety and interpretability. Opus 4.7 already boasted a 92% honesty rate with reduced sycophancy and hallucinations. Now, Anthropic claims Opus 4.8 shows “substantially” lower misalignment rates than 4.7, comparing its alignment directly to that of Mythos Preview , a high bar that signals an increasingly stringent safety standard.

GPT-5.5 Instant (OpenAI, May 5, 2026) is a lighter version of the recently released GPT-5.5, described by OpenAI as less verbose than its predecessor, GPT-5.3 Instant. It also promises fewer hallucinations and improved factuality, with OpenAI reporting “52.5% fewer hallucinated claims than GPT‑5.3 Instant on high-stakes prompts covering areas like medicine, law, and finance.” This model replaces GPT-5.3 as the default in ChatGPT. For a model most people use for quick queries, a major reduction in hallucinations could mean less misinformation spreading among the masses, especially critical for everyday health questions. (Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging copyright infringement in training and operating its AI systems.)

Nemotron 3 Nano Omni (Nvidia, April 28, 2026) is the latest addition to Nvidia’s open Nemotron family. It provides agents with multimodal input, allowing them to “perceive and reason across visual, audio, and textual inputs within a single shared perception‑to‑action loop,” according to Nvidia. This unifies multiple capabilities into one system, streamlining workflows that typically require separate models for speech, vision, and text. That fragmentation slows down tasks, undermines context, and increases inference costs. Nvidia’s approach, if effective, would reduce token use and save money. You can try it on Hugging Face.

GPT-5.5 (OpenAI, April 23, 2026) earned an Expert Score of 93/100 from ZDNET tester David Gewirtz, who gave it an A- grade. While he notes it “can be reductively described as better and faster than GPT-5.4,” which is the bare minimum for a new model, it showed notable improvements in agentic coding, concept identification, scientific research, and factual accuracy. The quick turnaround from GPT-5.4 to GPT-5.5 , less than two months , highlights how rapidly agentic coding is accelerating OpenAI’s release cycle. As David Gewirtz explains, the company, like other frontier labs using AI to build AI, is shipping updates at an exponentially increasing rate.

ChatGPT Images 2 (OpenAI, April 23, 2026) arrived soon after OpenAI sunsetted Sora, its generative video model and social platform. David Gewirtz got an early look and was impressed, describing it as fun, a huge leap, and actually useful for work, though no formal Expert Score was assigned. This release is notable because OpenAI appeared to be pivoting away from consumer-focused AI products after discontinuing Sora, having been outpaced by Anthropic in enterprise contracts. That OpenAI still launched Images 2 suggests it sees image generation as relevant to enterprise AI, especially following Anthropic’s Claude Design.

Claude Opus 4.7 (Anthropic, April 16, 2026) arrived quickly after Opus 4.6, boasting new highs in honesty, reduced sycophancy, and fewer hallucinations. It also powers the new Claude Security tool, released shortly after the model itself. However, it is not Mythos, as some speculated. Hallucinations and honesty remain among the most difficult challenges for even the best models. For Anthropic to claim significant gains in these areas is a major achievement for a lab that prioritizes safety.

Claude Mythos (Preview) (Anthropic, April 7, 2026) is not publicly available. Anthropic created a media stir by positioning this new general-purpose model as too powerful to release as usual. The company was particularly alarmed by its striking capability at computer security tasks, calling it a step change from earlier models. In response, Anthropic launched Project Glasswing, a collaborative effort with rivals like Google, Nvidia, and Microsoft, as well as security authorities like Palo Alto Networks, “to help secure the world’s most critical software, and to prepare the industry for the practices we all will need to adopt to keep ahead of cyberattackers.” If Anthropic’s guidance is correct, cybersecurity systems as they stand may not be prepared for the rapidly evolving frontier of model capabilities. Mythos may be the first of many such models once other labs achieve similar breakthroughs. For now, just weeks into its release, Mythos is already helping catch software bugs in droves.

GPT-5.4 (OpenAI, March 5, 2026) was framed by OpenAI as specifically designed for professional work. Released barely three months after GPT-5.2, the company’s own testing (which should be taken with a grain of salt until third-party verification) shows it matches or outperforms human professionals 83% of the time. As AI companies increasingly focus on gaining enterprise trust and contracts, they need models that can handle complex work tasks with minimal risk, delay, or cost. Any advancement showing prowess in professional workflows has a better chance of being taken seriously by companies struggling to adopt AI, though seamless integration is not guaranteed.

Claude Opus 4.6 (Anthropic, Feb. 5, 2026) quickly redefined the standard for autonomous agentic work, especially in coding , no surprise given Anthropic’s expertise in that area. It also demonstrated improvement in complex, longer-running tasks. Its ability to handle tasks better on its own means you can reliably offload more of your workflow to it, something agentic offerings usually struggle with.

GPT-5.3-Codex (OpenAI, Feb. 5, 2026) is a coding model that OpenAI says helped build and debug itself. It can be interrupted and redirected mid-task, a huge advantage for developers working on complex or shifting projects with lots of trial-and-error. It also boasts run times of over a day and a better grasp on user intent. OpenAI is trying to catch up to Anthropic’s lead in agentic coding , notably, it released 5.3 Codex on the same day as Anthropic launched Opus 4.6. While ZDNET experts often prefer Claude Code for vibe coding, OpenAI’s rumored shift toward enterprise clients could eventually close that gap.

(Source: ZDNet)

Topics

model release tracker 95% ai model benchmarks 92% model safety 90% agentic coding 88% enterprise ai 85% cybersecurity threats 82% open source models 78% Multimodal AI 75% hallucination reduction 73% inference costs 70%