Topic: benchmark performance
-
Gemini 3 Leads the AI Race - For Now
Google's Gemini 3 has achieved record-breaking performance and adoption, quickly topping AI leaderboards and attracting over one million users in its first day, setting a new industry benchmark. The model demonstrates a clear lead in key areas like coding, mathematics, and visual comprehension, w...
Read More » -
OpenAI's 'Code Red' to Boost ChatGPT as Google Rivalry Heats Up
OpenAI has declared a "code red" to urgently improve ChatGPT's user experience, focusing on personalization, speed, and reliability in response to competitive pressure. The company is postponing other initiatives like advertising integration and specialized AI agents to concentrate resources on e...
Read More » -
China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push
Z.ai (formerly Zhipu) launched the advanced GLM-4.5 open-source language model, optimized for intelligent agent applications, strengthening China's position in generative AI. The model comes in two versions—a 355B-parameter flagship and a 106B-parameter streamlined variant—ranking third globally ...
Read More » -
Nous Research Launches Hermes 4 AI, Outperforming ChatGPT Without Restrictions
Hermes 4 is a family of open-source large language models that challenges proprietary AI systems by offering comparable performance with fewer content restrictions and greater user control. It introduces a hybrid reasoning feature for transparency in problem-solving and achieves top-tier results,...
Read More » -
Anthropic's New Opus 4.5: More Power, Lower Cost
Anthropic has launched Opus 4.5, its new flagship model, with enhanced coding capabilities and user experience, strengthening its position against competitors like OpenAI. The model introduces intelligent context management by summarizing earlier conversation segments, ensuring smoother and more ...
Read More » -
China's Free AI Model Outperforms GPT-5 and Sonnet 4.5
Moonshot's new open-source AI model, Kimi K2 Thinking, claims to outperform top proprietary models like GPT-5 and Claude Sonnet 4.5 on key benchmarks including reasoning and information retrieval. The model is freely available, trained for just $4.6 million, and uses a Mixture-of-Experts architec...
Read More » -
OpenAI's Codex Gets Major Upgrade with New GPT-5 Model
OpenAI has launched an upgraded GPT-5-Codex model for its AI coding assistant, Codex, which improves performance on complex programming tasks by dynamically adjusting processing time from seconds to hours. The updated model is available to all ChatGPT Plus, Pro, Business, Edu, and Enterprise subs...
Read More » -
China's DeepSeek Enters AI Agent Race Against OpenAI and Microsoft
DeepSeek plans to launch an advanced AI agent called R2 by late 2025, positioning it as a major competitor to firms like OpenAI and Microsoft in the global AI landscape. The R2 model builds on the success of R1, which matched or exceeded U.S. competitors' performance at a lower cost, and focuses ...
Read More » -
ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context
ByteDance's Seed-OSS-36B is an open-source language model with a 512,000 token context window, double that of many competitors, and is available under the permissive Apache-2.0 license for both commercial and research use. The model family includes three variants: a base model with and without sy...
Read More » -
Google Launches Gemini 2.5 Deep Think for AI Ultra Users
Google's latest AI model, Gemini 2.5 Deep Think, is now available exclusively to premium subscribers on the $250 AI Ultra plan, offering advanced problem-solving capabilities with high computational demands. The model uses extended "thinking time" and parallel analysis to refine hypotheses, excel...
Read More » -
Open-Source AI Coding Model Rivals Proprietary Options
Mistral AI has launched Devstral 2, a powerful open-source AI coding model that achieves a 72.2% score on the SWE-bench benchmark, positioning it as a strong competitor to proprietary tools. The release includes the Mistral Vibe command-line tool for project-wide AI assistance and a smaller, loca...
Read More » -
Google Gemini's AI Image Model Gets a 'Bananas' Upgrade
Google has launched Gemini 2.5 Flash Image, an upgraded AI model for precise photo editing via natural language, available to all users and developers. The update is a strategic move to compete with OpenAI and other tech giants in AI image generation, aiming to attract more users to Google's ecos...
Read More » -
Google's Gemini 3 Breaks Records With New Coding App
Google has launched Gemini 3, its most advanced foundation model, available via the Gemini app and AI search, positioning it as a strong competitor in the AI market following recent releases from OpenAI and Anthropic. Gemini 3 showcases significant improvements in reasoning, achieving record scor...
Read More » -
Google's AI Agent Mimics Human Writing for Better Research
Google's TTD-DR AI research agent mimics human writing techniques, using iterative drafting to outperform competitors in accuracy and coherence for business insights. Unlike rigid AI systems, TTD-DR employs diffusion mechanisms and continuous refinement, combining draft improvement with self-evol...
Read More » -
Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription
Elon Musk's xAI launched "Grok 4" and Grok 4 Heavy, its most advanced AI models, alongside a $300/month SuperGrok Heavy subscription, positioning them as competitors to ChatGPT and Gemini. Musk claims Grok 4 surpasses PhD-level expertise but has occasional common-sense lapses, while...
Read More » -
GPT-5.1 Upgrade: How ChatGPT Gets Warmer and Smarter
GPT-5.1 introduces two new models—Instant for quick, warm everyday interactions and Thinking for complex, in-depth analysis—enhancing conversational quality and adaptability. The update expands customization with selectable personality tones, adjustable response characteristics, and improved unde...
Read More » -
Claude Sonnet 4.5: Anthropic's Most Powerful AI for Coding
Anthropic has launched Claude Sonnet 4.5, its most advanced AI model for software development, which excels at creating production-ready applications and is available at the same pricing as its predecessor. The model leads in key coding benchmarks and demonstrated autonomous coding for extended p...
Read More » -
Apple's New AI Model Sees, Creates and Edits Images
Apple has introduced UniGen 1.5, a unified AI model that combines image understanding, generation, and editing into a single system, moving beyond separate specialized models. The model's key innovations include an "Edit Instruction Alignment" training phase for better interpreting edit commands ...
Read More » -
Gemini's Upgraded Deep Research & Agent Now Available
Google has released an upgraded Gemini Deep Research agent, powered by the Gemini 3 Pro model, which autonomously handles complex, long-form research tasks with improved accuracy and enhanced web search capabilities. The agent demonstrates superior performance on demanding benchmarks, outperformi...
Read More » -
OpenAI Launches GPT-5.2 in Response to Google's 'Code Red'
OpenAI has launched GPT-5.2, its most advanced model, offered in three versions (Instant, Thinking, Pro) to cater to different professional needs like speed, complex tasks, and accuracy. The model demonstrates enhanced capabilities in coding, math, vision, and long-context reasoning, with the Thi...
Read More » -
Google Launches Gemini 3 AI Model and Antigravity IDE
Google has launched the Gemini 3 Pro AI model and the Antigravity IDE, expanding its AI ecosystem and reinforcing its AI-first strategy with immediate availability for developers and users. Gemini 3 Pro features enhanced reasoning and multimodal understanding, achieving a top ELO score of 1,501 o...
Read More » -
Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones
Liquid AI has introduced LFM2-VL, a next-gen multimodal AI model optimized for smartphones and wearables, offering high speed and low resource usage while handling text and visual inputs. The model uses a unique Linear Input-Varying (LIV) approach and modular design, doubling GPU speeds and maint...
Read More » -
Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency
Salesforce’s CoAct-1 system combines code execution and GUI navigation to streamline complex workflows, outperforming traditional automation with faster, more accurate results. The system uses three specialized agents—Orchestrator, Programmer, and GUI Operator—to divide tasks efficiently, leverag...
Read More » -
AI2's Compact Model Outshines Google & Meta in Performance
AI2's Olmo 2 1B, a 1-billion-parameter AI model, outperforms similar-sized models from Google, Meta, and Alibaba across benchmarks while being lightweight enough for everyday devices. The model is transparent and accessible, released under Apache 2.0 with full training data and code, enab...
Read More » -
OpenAI's New Coding Model Skips Nvidia, Uses Compact Chips
OpenAI has launched "GPT-5.3-Codex-Spark", a new coding model that runs on Cerebras chips and achieves a speed of over 1,000 tokens per second, a fifteen-fold increase over its predecessor. The model is optimized for raw speed and text-only coding tasks, featuring a 128,000-token context window...
Read More » -
Claude's AI Moment: Can It Last?
Anthropic's Claude Code platform has gained significant traction among developers, driven by the powerful Opus 4.5 model which enables reliable, autonomous handling of complex coding tasks with minimal supervision. The platform's success reflects tangible progress in AI agents, with enterprises a...
Read More » -
OpenAI Halts ChatGPT Ads to Counter Google's Gemini
OpenAI has paused plans to introduce ads into ChatGPT, prioritizing product quality and user retention over new revenue streams due to competitive pressure from Google's Gemini. Google's native multimodal Gemini model is outperforming ChatGPT in key areas like reasoning and speed, leveraging its ...
Read More » -
Runway AI Raises $315M at $5.3B Valuation for Advanced World Models
Runway AI raised $315 million, doubling its valuation to $5.3 billion, with funds targeted at developing advanced "world models" for simulating and predicting real-world environments. The company is expanding beyond its creative industry roots, seeing growing adoption in gaming and robotics, and ...
Read More » -
ASRock X870 Taichi Creator Review: Power for Creative Pros
The ASRock X870 Taichi Creator is a high-performance AM5 motherboard priced under $320, designed for creative professionals with robust power delivery, dual PCIe 5.0 slots, and extensive connectivity including 10 GbE and multiple USB ports. It offers exceptional connectivity with dual LAN ports (...
Read More » -
QwenLong-L1 Outperforms LLMs in Long-Context Reasoning
Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing a key limitation in current AI systems. The framework uses a multi-stage reinforcement learning approach, including supervised fine-tuning an...
Read More »