ai benchmarking

GPT-5.4 Shatters Professional Benchmark Records

March 7, 2026

GPT-5.4 Thinking text on blurred orange and pink background.

OpenAI has launched GPT-5.4, a powerful frontier model for professional work, available in standard, specialized "Thinking," and high-performance "Pro" configurations.…

AI & Tech

Google’s Gemini Pro Shatters Benchmark Records Again

February 22, 2026

Google has launched a preview of its advanced large language model, "Gemini Pro 3.1", which is positioned as a major…

Artificial Intelligence

LMArena Hits $1.7B Valuation Just Four Months After Launch

January 7, 2026

LLM concept map with connected icons on a dark blue background.

LMArena achieved a $1.7 billion valuation after a $150 million Series A round, reflecting intense market demand for independent AI…

Artificial Intelligence

GPT-5.2 vs. Gemini 3: Can It Finally Surpass the Competition?

December 13, 2025

OpenAI has released **GPT-5.2**, a model designed for professional knowledge work, claiming it is their most capable yet for enhancing…

AI & Tech

OpenAI Launches GPT-5.2 in Response to Google’s ‘Code Red’

December 11, 2025

Man in suit looks up under bright lights, facing forward.

OpenAI has launched GPT-5.2, its most advanced model, offered in three versions (Instant, Thinking, Pro) to cater to different professional…

Artificial Intelligence

GPT-5 Matches Human Performance in Diverse Jobs, Says OpenAI

September 26, 2025

Robotic hand using a laptop keyboard, representing AI and automation in technology.

OpenAI's GDPval benchmark evaluates AI performance against human professionals in key economic sectors, showing models like GPT-5 and Claude Opus…

AI & Tech

Google Launches Gemini AI for Advanced Parallel Reasoning

August 1, 2025

Google launched Gemini 2.5 Deep Think, its most advanced AI reasoning model, capable of solving complex problems by evaluating multiple…

AI & Tech

Mathematicians Battle AI in Secret Showdown

July 13, 2025

A digital rendering of a human face composed of blue binary code, with streams of binary code cascading down around it.

But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday,…

Artificial Intelligence

QwenLong-L1 Outperforms LLMs in Long-Context Reasoning

May 31, 2025

A silver robot sits in a library, carefully reviewing a long scroll labeled 'Protocol 743'.

Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing…

Artificial Intelligence