Topic: benchmark performance
-
China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push
Z.ai (formerly Zhipu) launched the advanced GLM-4.5 open-source language model, optimized for intelligent agent applications, strengthening China's position in generative AI. The model comes in two versions—a 355B-parameter flagship and a 106B-parameter streamlined variant—ranking third globally ...
Read More » -
Nous Research Launches Hermes 4 AI, Outperforming ChatGPT Without Restrictions
Hermes 4 is a family of open-source large language models that challenges proprietary AI systems by offering comparable performance with fewer content restrictions and greater user control. It introduces a hybrid reasoning feature for transparency in problem-solving and achieves top-tier results,...
Read More » -
OpenAI's Codex Gets Major Upgrade with New GPT-5 Model
OpenAI has launched an upgraded GPT-5-Codex model for its AI coding assistant, Codex, which improves performance on complex programming tasks by dynamically adjusting processing time from seconds to hours. The updated model is available to all ChatGPT Plus, Pro, Business, Edu, and Enterprise subs...
Read More » -
China's DeepSeek Enters AI Agent Race Against OpenAI and Microsoft
DeepSeek plans to launch an advanced AI agent called R2 by late 2025, positioning it as a major competitor to firms like OpenAI and Microsoft in the global AI landscape. The R2 model builds on the success of R1, which matched or exceeded U.S. competitors' performance at a lower cost, and focuses ...
Read More » -
ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context
ByteDance's Seed-OSS-36B is an open-source language model with a 512,000 token context window, double that of many competitors, and is available under the permissive Apache-2.0 license for both commercial and research use. The model family includes three variants: a base model with and without sy...
Read More » -
Google Launches Gemini 2.5 Deep Think for AI Ultra Users
Google's latest AI model, Gemini 2.5 Deep Think, is now available exclusively to premium subscribers on the $250 AI Ultra plan, offering advanced problem-solving capabilities with high computational demands. The model uses extended "thinking time" and parallel analysis to refine hypotheses, excel...
Read More » -
Google Gemini's AI Image Model Gets a 'Bananas' Upgrade
Google has launched Gemini 2.5 Flash Image, an upgraded AI model for precise photo editing via natural language, available to all users and developers. The update is a strategic move to compete with OpenAI and other tech giants in AI image generation, aiming to attract more users to Google's ecos...
Read More » -
Google's AI Agent Mimics Human Writing for Better Research
Google's TTD-DR AI research agent mimics human writing techniques, using iterative drafting to outperform competitors in accuracy and coherence for business insights. Unlike rigid AI systems, TTD-DR employs diffusion mechanisms and continuous refinement, combining draft improvement with self-evol...
Read More » -
Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription
Elon Musk's xAI launched "Grok 4" and Grok 4 Heavy, its most advanced AI models, alongside a $300/month SuperGrok Heavy subscription, positioning them as competitors to ChatGPT and Gemini. Musk claims Grok 4 surpasses PhD-level expertise but has occasional common-sense lapses, while...
Read More » -
Claude Sonnet 4.5: Anthropic's Most Powerful AI for Coding
Anthropic has launched Claude Sonnet 4.5, its most advanced AI model for software development, which excels at creating production-ready applications and is available at the same pricing as its predecessor. The model leads in key coding benchmarks and demonstrated autonomous coding for extended p...
Read More » -
Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones
Liquid AI has introduced LFM2-VL, a next-gen multimodal AI model optimized for smartphones and wearables, offering high speed and low resource usage while handling text and visual inputs. The model uses a unique Linear Input-Varying (LIV) approach and modular design, doubling GPU speeds and maint...
Read More » -
Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency
Salesforce’s CoAct-1 system combines code execution and GUI navigation to streamline complex workflows, outperforming traditional automation with faster, more accurate results. The system uses three specialized agents—Orchestrator, Programmer, and GUI Operator—to divide tasks efficiently, leverag...
Read More » -
AI2's Compact Model Outshines Google & Meta in Performance
AI2's Olmo 2 1B, a 1-billion-parameter AI model, outperforms similar-sized models from Google, Meta, and Alibaba across benchmarks while being lightweight enough for everyday devices. The model is transparent and accessible, released under Apache 2.0 with full training data and code, enab...
Read More » -
QwenLong-L1 Outperforms LLMs in Long-Context Reasoning
Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing a key limitation in current AI systems. The framework uses a multi-stage reinforcement learning approach, including supervised fine-tuning an...
Read More »