Topic: swe-bench

  • Anthropic's New Opus 4.5: More Power, Lower Cost

    Anthropic's New Opus 4.5: More Power, Lower Cost

    Anthropic has launched Opus 4.5, its new flagship model, with enhanced coding capabilities and user experience, strengthening its position against competitors like OpenAI. The model introduces intelligent context management by summarizing earlier conversation segments, ensuring smoother and more ...

    Read More »
  • First AI Coding Challenge Results Reveal Major Flaws

    First AI Coding Challenge Results Reveal Major Flaws

    The K Prize AI coding competition revealed major gaps in AI capabilities, with the winning entry scoring only 7.5% accuracy, highlighting AI's struggles with real-world programming challenges. Unlike traditional benchmarks, the K Prize prevents data contamination by using only post-deadline GitHu...

    Read More »