Topic: swe-bench

Sort by: Relevance | Date

November 25, 2025
75%
Anthropic's New Opus 4.5: More Power, Lower Cost
Anthropic has launched Opus 4.5, its new flagship model, with enhanced coding capabilities and user experience, strengthening its position against competitors like OpenAI. The model introduces intelligent context management by summarizing earlier conversation segments, ensuring smoother and more ...
Read More »
July 24, 2025
75%
First AI Coding Challenge Results Reveal Major Flaws
The K Prize AI coding competition revealed major gaps in AI capabilities, with the winning entry scoring only 7.5% accuracy, highlighting AI's struggles with real-world programming challenges. Unlike traditional benchmarks, the K Prize prevents data contamination by using only post-deadline GitHu...
Read More »