AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnology

Claude’s new model admits when it makes mistakes

▼ Summary

– Anthropic is releasing Claude Opus 4.8 on Thursday, emphasizing the model’s focus on “honesty.”
– The company trains all its models to avoid making unsupported claims, addressing a general AI problem of jumping to conclusions.
– Early testers found Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims.
– In Anthropic’s evaluations, Opus 4.8 is about four times less likely than its predecessor to make unsupported claims.

Anthropic is rolling out Claude Opus 4.8 this Thursday, and the company is putting a spotlight on a feature that many users might find refreshing: the model’s ability to admit when it is unsure.

Anthropic explains that it trains “all [its] models to be honest , for instance, to avoid making claims that they can’t support.” However, the company acknowledges a persistent challenge in the AI field: models often “jump to conclusions, confidently presenting their work as making progress despite thin evidence.” This tendency, sometimes called hallucination or overconfidence, has been a major hurdle for building trust in AI systems.

Early feedback from testers suggests Opus 4.8 represents a meaningful step forward. According to Anthropic, the new model “is more likely to flag uncertainties about its work and less likely to make unsupported claims.” In internal evaluations, Opus 4.8 performed roughly four times better than its predecessor at avoiding unsupported assertions. That kind of improvement could have significant implications for users who rely on AI for research, analysis, or decision-making where accuracy is paramount. By being more transparent about its own limitations, the model aims to build a more reliable and collaborative partnership with its human users.

(Source: The Verge)

Topics

claude opus 4.8 95% ai honesty 93% anthropic ai 90% model evaluation 85% unsupported claims 83% ai safety 80% model training 78% Transparency 76% ai progress 73% anthropic news 71%