Anthropic’s New Opus 4.5: More Power, Lower Cost

▼ Summary
– Anthropic released Opus 4.5, its flagship model, with improved coding performance and user experience enhancements to better compete with OpenAI’s models.
– The consumer app version of Claude now avoids abruptly ending long conversations by improving memory management within a single session.
– Instead of hard-stopping at the context limit, Claude now summarizes key points from earlier conversation parts to maintain coherence.
– Developers using Anthropic’s API can apply similar context management and compaction techniques for better performance.
– Opus 4.5 achieved 80.9% accuracy on the SWE-Bench Verified benchmark, outperforming competitors in coding but lagging in visual reasoning compared to GPT-5.1.
Anthropic has officially launched its new flagship model, Opus 4.5, delivering notable gains in coding capabilities and overall user experience that strengthen its position against competing frontier models from OpenAI. This release focuses on enhancing practical usability while improving performance metrics across key benchmarks.
A significant upgrade for everyday users involves Claude’s handling of long conversations. Previously, the model would abruptly end discussions once the 200,000-token context limit was reached, even if a user had remaining session or weekly usage allowances. Unlike other large language models that gradually trim older messages, often leading to disjointed and forgetful responses, Claude now avoids this by summarizing earlier conversation segments behind the scenes. This intelligent context management preserves crucial information while discarding less relevant details, ensuring smoother, more coherent extended dialogues. The enhancement applies not only to Opus 4.5 but also to all current Claude models accessible through web, mobile, and desktop applications. Developers utilizing Anthropic’s API can implement similar functionality via context management and compaction features.
When it comes to raw performance, Opus 4.5 sets a new standard in coding accuracy. It is the first model to exceed 80 percent accuracy on the SWE-Bench Verified benchmark, achieving an impressive 80.9 percent. This places it ahead of OpenAI’s GPT-5.1-Codex-Max, which scored 77.9 percent, and Google’s Gemini 3 Pro at 76.2 percent. The model excels particularly in agentic coding tasks and agentic tool use evaluations, demonstrating robust capabilities for complex, multi-step programming assignments. However, it still trails behind GPT-5.1 in the visual reasoning MMMU benchmark, indicating room for growth in that specific area.
(Source: Ars Technica)





