Topic: benchmark performance

August 27, 2025

Google Gemini's AI Image Model Gets a 'Bananas' Upgrade

Google has launched Gemini 2.5 Flash Image, an upgraded AI model for precise photo editing via natural language, available to all users and developers. The update is a strategic move to compete with OpenAI and other tech giants in AI image generation, aiming to attract more users to Google's ecos...

August 21, 2025

ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context

ByteDance's Seed-OSS-36B is an open-source language model with a 512,000 token context window, double that of many competitors, and is available under the permissive Apache-2.0 license for both commercial and research use. The model family includes three variants: a base model with and without sy...

August 13, 2025

Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones

Liquid AI has introduced LFM2-VL, a next-gen multimodal AI model optimized for smartphones and wearables, offering high speed and low resource usage while handling text and visual inputs. The model uses a unique Linear Input-Varying (LIV) approach and modular design, doubling GPU speeds and maint...

August 13, 2025

Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency

Salesforce’s CoAct-1 system combines code execution and GUI navigation to streamline complex workflows, outperforming traditional automation with faster, more accurate results. The system uses three specialized agents—Orchestrator, Programmer, and GUI Operator—to divide tasks efficiently, leverag...

August 7, 2025

Google's AI Agent Mimics Human Writing for Better Research

Google's TTD-DR AI research agent mimics human writing techniques, using iterative drafting to outperform competitors in accuracy and coherence for business insights. Unlike rigid AI systems, TTD-DR employs diffusion mechanisms and continuous refinement, combining draft improvement with self-evol...

August 1, 2025

Google Launches Gemini 2.5 Deep Think for AI Ultra Users

Google's latest AI model, Gemini 2.5 Deep Think, is now available exclusively to premium subscribers on the $250 AI Ultra plan, offering advanced problem-solving capabilities with high computational demands. The model uses extended "thinking time" and parallel analysis to refine hypotheses, excel...

July 30, 2025

China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push

Z.ai (formerly Zhipu) launched the advanced GLM-4.5 open-source language model, optimized for intelligent agent applications, strengthening China's position in generative AI. The model comes in two versions—a 355B-parameter flagship and a 106B-parameter streamlined variant—ranking third globally ...

July 10, 2025

Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription

Elon Musk's xAI launched "Grok 4" and Grok 4 Heavy, its most advanced AI models, alongside a $300/month SuperGrok Heavy subscription, positioning them as competitors to ChatGPT and Gemini. Musk claims Grok 4 surpasses PhD-level expertise but has occasional common-sense lapses, while...

May 31, 2025

QwenLong-L1 Outperforms LLMs in Long-Context Reasoning

Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing a key limitation in current AI systems. The framework uses a multi-stage reinforcement learning approach, including supervised fine-tuning an...

May 2, 2025

AI2's Compact Model Outshines Google & Meta in Performance

AI2's Olmo 2 1B, a 1-billion-parameter AI model, outperforms similar-sized models from Google, Meta, and Alibaba across benchmarks while being lightweight enough for everyday devices. The model is transparent and accessible, released under Apache 2.0 with full training data and code, enab...