Artificial IntelligenceBigTech CompaniesNewswireTechnologyWhat's Buzzing

Claude Opus 4.7 Outperforms GPT-5.4, Gemini 3.1 Pro

▼ Summary

– Anthropic has released Claude Opus 4.7, its top generally available model, priced at $5/$25 per million tokens and available across its plans and major cloud platforms.
– It achieves a leading 64.3% score on the SWE-bench Pro software engineering benchmark, outperforming key competitors like GPT-5.4.
– The model shows major improvements in agentic workflows, with a 14% boost in multi-step reasoning and a third fewer tool errors, plus new multi-agent coordination.
– It features significantly enhanced vision, processing images at over three times the resolution of prior Claude models for detailed document analysis.
– The release supports Anthropic’s strong commercial position, with the company running at a $30 billion annualized revenue rate and in early IPO talks.

Anthropic has launched Claude Opus 4.7, its most advanced publicly available AI model, delivering benchmark-leading performance in software engineering and complex reasoning. This release significantly widens the performance gap with rivals OpenAI and Google on the practical tasks that drive enterprise adoption. The model is now accessible across all Claude subscription tiers and major cloud platforms, offering superior capability at the same price point as its predecessor.

The new model establishes a clear lead in software engineering benchmarks, a critical area for developer tools. On SWE-bench Pro, which evaluates a model’s ability to fix real-world coding issues, Opus 4.7 achieved a score of 64.3%. This surpasses GPT-5.4’s 57.7% and Gemini 3.1 Pro’s 54.2%, while also marking a substantial jump from the previous Opus 4.6 score of 53.4%. In the CursorBench test for autonomous coding, the model scored 70%, up from 58%. These gains are particularly significant given that Claude Code, Anthropic’s coding product, already generates $2.5 billion in annualized revenue, highlighting the commercial importance of this domain.

On tests of pure reasoning, such as the graduate-level GPQA Diamond benchmark, the top models have effectively reached parity. Opus 4.7 scored 94.2%, virtually identical to the scores from GPT-5.4 Pro and Gemini 3.1 Pro. This convergence suggests the competitive battlefield is shifting from raw knowledge to applied performance on complex, multi-step tasks.

The most impactful upgrades in Opus 4.7 may be its agentic reasoning capabilities, which are not fully captured by standard benchmarks. Anthropic reports a 14% improvement in executing complex, multi-step workflows compared to Opus 4.6, while also using fewer computational resources and generating only a third of the tool errors. A key advancement is the model’s new ability to pass “implicit-need tests,” where it must infer the necessary tools or actions without explicit instruction.

Furthermore, the model introduces multi-agent coordination, allowing it to orchestrate parallel AI workstreams instead of handling tasks one after another. This feature is engineered for enterprise environments where simultaneous processes like code review and document analysis are common. Anthropic also claims the model can maintain focus over workflows lasting several hours, addressing a frequent criticism that AI models lose coherence on extended tasks. Enhanced resilience is another focus, with the new version designed to recover from tool failures that would have halted earlier models.

For visual processing, Opus 4.7 can handle images at a resolution over three times higher than previous Claude models, supporting detailed analysis of scanned contracts, technical diagrams, and financial documents. Its context window remains at one million tokens. While this is half the capacity of Gemini 3.1 Pro, Opus 4.7 tied for the top score on a suite of long-context research evaluations, with testers noting it delivered the most consistent performance.

The model also adheres to instructions more precisely, a change that may require prompt adjustments from users. This trade-off reduces creative ambiguity but also minimizes the hallucination and off-task behavior that complicate enterprise deployments.

Claude Opus 4.7 is priced at $5 per million input tokens and $25 per million output tokens, matching the cost of Opus 4.6. It is available on Claude Pro, Team, and Enterprise plans, and through APIs on Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Cost-saving options include prompt caching and a Batch API offering a 50% discount. While Gemini 3.1 Pro is cheaper, Opus 4.7’s commanding lead on software engineering and agentic tasks may justify the premium for performance-sensitive customers. The release also incorporates new cyber safeguards to automatically block requests related to prohibited cybersecurity uses.

This model arrives as Anthropic’s commercial momentum is undeniable. The company is operating at a $30 billion annualized revenue rate and has fielded investor interest valuing it near $800 billion. Opus 4.7 is the product that must justify that staggering valuation. It does not win every benchmark, but it delivers meaningful improvements where it counts most for paying customers: in coding, agentic reasoning, vision, and task resilience. For now, on the complex workloads that generate real revenue, Anthropic appears to have the most capable model available.

(Source: The Next Web)

Topics

claude opus 4.7 100% software engineering benchmark 95% agentic reasoning 93% model pricing 90% enterprise ai adoption 88% ai vision capabilities 85% benchmark performance 83% anthropic commercial momentum 80% ai model resilience 78% instruction following 75%