swe-bench evaluation

Claude Opus 4 Breaks Records: Outperforms OpenAI in AI Coding Marathon

May 23, 2025

Anthropic's Claude Opus 4 and Sonnet 4 AI models set new benchmarks in professional environments, with Opus maintaining focus on…