Topic: benchmark performance
-
Google Gemini's AI Image Model Gets a 'Bananas' Upgrade
Google has launched Gemini 2.5 Flash Image, an upgraded AI model for precise photo editing via natural language, available to all users and developers. The update is a strategic move to compete with OpenAI and other tech giants in AI image generation, aiming to attract more users to Google's ecos...
Read More » -
ByteDance's New Seed-OSS-36B Model Boasts 512K Token Context
ByteDance's Seed-OSS-36B is an open-source language model with a 512,000 token context window, double that of many competitors, and is available under the permissive Apache-2.0 license for both commercial and research use. The model family includes three variants: a base model with and without sy...
Read More » -
Liquid AI's LFM2-VL Model Brings Fast, Vision-Capable AI to Smartphones
Liquid AI has introduced LFM2-VL, a next-gen multimodal AI model optimized for smartphones and wearables, offering high speed and low resource usage while handling text and visual inputs. The model uses a unique Linear Input-Varying (LIV) approach and modular design, doubling GPU speeds and maint...
Read More » -
Salesforce CoAct-1 Agents Write Code to Boost Task Efficiency
Salesforce’s CoAct-1 system combines code execution and GUI navigation to streamline complex workflows, outperforming traditional automation with faster, more accurate results. The system uses three specialized agents—Orchestrator, Programmer, and GUI Operator—to divide tasks efficiently, leverag...
Read More » -
Google's AI Agent Mimics Human Writing for Better Research
Google's TTD-DR AI research agent mimics human writing techniques, using iterative drafting to outperform competitors in accuracy and coherence for business insights. Unlike rigid AI systems, TTD-DR employs diffusion mechanisms and continuous refinement, combining draft improvement with self-evol...
Read More » -
Google Launches Gemini 2.5 Deep Think for AI Ultra Users
Google's latest AI model, Gemini 2.5 Deep Think, is now available exclusively to premium subscribers on the $250 AI Ultra plan, offering advanced problem-solving capabilities with high computational demands. The model uses extended "thinking time" and parallel analysis to refine hypotheses, excel...
Read More » -
China's Zhipu AI Debuts Powerful GLM-4.5 Model in Open-Source Push
Z.ai (formerly Zhipu) launched the advanced GLM-4.5 open-source language model, optimized for intelligent agent applications, strengthening China's position in generative AI. The model comes in two versions—a 355B-parameter flagship and a 106B-parameter streamlined variant—ranking third globally ...
Read More » -
Elon Musk's xAI Unveils Grok 4 with $300/Month Subscription
Elon Musk's xAI launched "Grok 4" and Grok 4 Heavy, its most advanced AI models, alongside a $300/month SuperGrok Heavy subscription, positioning them as competitors to ChatGPT and Gemini. Musk claims Grok 4 surpasses PhD-level expertise but has occasional common-sense lapses, while...
Read More » -
QwenLong-L1 Outperforms LLMs in Long-Context Reasoning
Alibaba's QwenLong-L1 framework enables large language models to analyze lengthy documents (hundreds of thousands of tokens) with high accuracy, addressing a key limitation in current AI systems. The framework uses a multi-stage reinforcement learning approach, including supervised fine-tuning an...
Read More » -
AI2's Compact Model Outshines Google & Meta in Performance
AI2's Olmo 2 1B, a 1-billion-parameter AI model, outperforms similar-sized models from Google, Meta, and Alibaba across benchmarks while being lightweight enough for everyday devices. The model is transparent and accessible, released under Apache 2.0 with full training data and code, enab...
Read More »