Topic: benchmark performance

China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push

July 30, 2025

88%

China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push

Z.ai (formerly Zhipu) launched the advanced GLM-4.5 open-source language model, optimized for intelligent agent applications, strengthening China's position in generative AI. The model comes in two versions—a 355B-parameter flagship and a 106B-parameter streamlined variant—ranking third globally ...

Nous Research Launches Hermes 4 AI, Outperforming ChatGPT Without Restrictions

August 29, 2025

87%

Nous Research Launches Hermes 4 AI, Outperforming ChatGPT Without Restrictions

Hermes 4 is a family of open-source large language models that challenges proprietary AI systems by offering comparable performance with fewer content restrictions and greater user control. It introduces a hybrid reasoning feature for transparency in problem-solving and achieves top-tier results,...

OpenAI's Codex Gets Major Upgrade with New GPT-5 Model

September 15, 2025

85%

OpenAI's Codex Gets Major Upgrade with New GPT-5 Model

OpenAI has launched an upgraded GPT-5-Codex model for its AI coding assistant, Codex, which improves performance on complex programming tasks by dynamically adjusting processing time from seconds to hours. The updated model is available to all ChatGPT Plus, Pro, Business, Edu, and Enterprise subs...

China's DeepSeek Enters AI Agent Race Against OpenAI and Microsoft

September 7, 2025

85%

China's DeepSeek Enters AI Agent Race Against OpenAI and Microsoft

DeepSeek plans to launch an advanced AI agent called R2 by late 2025, positioning it as a major competitor to firms like OpenAI and Microsoft in the global AI landscape. The R2 model builds on the success of R1, which matched or exceeded U.S. competitors' performance at a lower cost, and focuses ...

ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context

August 21, 2025

85%

ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context

ByteDance's Seed-OSS-36B is an open-source language model with a 512,000 token context window, double that of many competitors, and is available under the permissive Apache-2.0 license for both commercial and research use. The model family includes three variants: a base model with and without sy...

Google Launches Gemini 2.5 Deep Think for AI Ultra Users

August 1, 2025

85%

Google Launches Gemini 2.5 Deep Think for AI Ultra Users

Google's latest AI model, Gemini 2.5 Deep Think, is now available exclusively to premium subscribers on the $250 AI Ultra plan, offering advanced problem-solving capabilities with high computational demands. The model uses extended "thinking time" and parallel analysis to refine hypotheses, excel...

Google Gemini's AI Image Model Gets a 'Bananas' Upgrade

August 27, 2025

82%

Google Gemini's AI Image Model Gets a 'Bananas' Upgrade

Google has launched Gemini 2.5 Flash Image, an upgraded AI model for precise photo editing via natural language, available to all users and developers. The update is a strategic move to compete with OpenAI and other tech giants in AI image generation, aiming to attract more users to Google's ecos...

Google's AI Agent Mimics Human Writing for Better Research

August 7, 2025

80%

Google's AI Agent Mimics Human Writing for Better Research

Google's TTD-DR AI research agent mimics human writing techniques, using iterative drafting to outperform competitors in accuracy and coherence for business insights. Unlike rigid AI systems, TTD-DR employs diffusion mechanisms and continuous refinement, combining draft improvement with self-evol...

Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription

July 10, 2025

80%

Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription

Elon Musk's xAI launched "Grok 4" and Grok 4 Heavy, its most advanced AI models, alongside a $300/month SuperGrok Heavy subscription, positioning them as competitors to ChatGPT and Gemini. Musk claims Grok 4 surpasses PhD-level expertise but has occasional common-sense lapses, while...

Claude Sonnet 4.5: Anthropic's Most Powerful AI for Coding

September 29, 2025

77%

Claude Sonnet 4.5: Anthropic's Most Powerful AI for Coding

Anthropic has launched Claude Sonnet 4.5, its most advanced AI model for software development, which excels at creating production-ready applications and is available at the same pricing as its predecessor. The model leads in key coding benchmarks and demonstrated autonomous coding for extended p...

Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones

August 13, 2025

75%

Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones

Liquid AI has introduced LFM2-VL, a next-gen multimodal AI model optimized for smartphones and wearables, offering high speed and low resource usage while handling text and visual inputs. The model uses a unique Linear Input-Varying (LIV) approach and modular design, doubling GPU speeds and maint...

Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency

August 13, 2025

75%

Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency

Salesforce’s CoAct-1 system combines code execution and GUI navigation to streamline complex workflows, outperforming traditional automation with faster, more accurate results. The system uses three specialized agents—Orchestrator, Programmer, and GUI Operator—to divide tasks efficiently, leverag...

AI2's Compact Model Outshines Google & Meta in Performance

May 2, 2025

75%

AI2's Compact Model Outshines Google & Meta in Performance

AI2's Olmo 2 1B, a 1-billion-parameter AI model, outperforms similar-sized models from Google, Meta, and Alibaba across benchmarks while being lightweight enough for everyday devices. The model is transparent and accessible, released under Apache 2.0 with full training data and code, enab...

QwenLong-L1 Outperforms LLMs in Long-Context Reasoning

May 31, 2025

60%

QwenLong-L1 Outperforms LLMs in Long-Context Reasoning

Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing a key limitation in current AI systems. The framework uses a multi-stage reinforcement learning approach, including supervised fine-tuning an...