AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnologyWhat's Buzzing

Google Launches Gemini 3 AI Model and Antigravity IDE

▼ Summary

– Google has released Gemini 3 Pro with enhanced visual outputs and improved factuality, marking a step toward artificial general intelligence.
– The model demonstrates expanded reasoning abilities and superior understanding of text, images, and video, leading the LMArena leaderboard with an ELO score of 1,501.
– Gemini 3 achieved a record 72.1% on the SimpleQA Verified test, showing progress in accuracy despite occasional errors in general knowledge.
– It set new benchmarks in advanced tasks, scoring 37.5% on Humanity’s Last Exam without tools and excelling in math and coding challenges.
– Google introduced Antigravity, a new AI-first integrated development environment, alongside Gemini 3’s release.

Google continues to aggressively expand its Gemini AI ecosystem, introducing the powerful new Gemini 3 Pro model alongside a groundbreaking development tool called Antigravity IDE. This latest release builds on the widespread integration of Gemini 2.5 across Google’s product suite, including Search and Gmail, signaling the company’s deepening commitment to an AI-first strategy. Both the new model and the development environment are available starting today, offering developers and users more advanced capabilities.

The Gemini 3 Pro represents the inaugural member of the Gemini 3 family, which Google positions as a significant milestone on the path toward achieving artificial general intelligence (AGI). This iteration boasts substantially enhanced simulated reasoning skills and demonstrates a more sophisticated grasp of multimodal inputs like text, images, and video. Early reception appears positive; the model has already claimed the top position on the LMArena leaderboard with an impressive ELO score of 1,501, surpassing its predecessor, Gemini 2.5 Pro, by a full 50 points.

Addressing the persistent challenge of factual accuracy in generative AI, Google reports that Gemini 3 marks considerable progress. The model achieved a record-breaking score of 72.1 percent on the 1,000-question SimpleQA Verified test. While this indicates the most advanced large language model still errs on nearly 30% of general knowledge queries, it nonetheless reflects meaningful improvement. On the notoriously difficult Humanity’s Last Exam, which assesses PhD-level knowledge and reasoning, Gemini 3 set another benchmark by scoring 37.5 percent without employing any external tools.

Mathematical problem-solving and software engineering are also central to Gemini 3’s upgraded skill set. The model established new highs in the MathArena Apex evaluation with a score of 23.4 percent and attained a 1487 ELO rating in the WebDev Arena. Perhaps most notably for developers, Gemini 3 achieved a remarkable 76.2 percent success rate on the SWE-bench Verified test, which rigorously measures a model’s proficiency in generating functional code. This performance underscores its potential as a powerful assistant for programming tasks.

(Source: Ars Technica)

Topics

gemini 3 95% ai rollout 85% factuality improvements 80% benchmark performance 75% artificial general intelligence 70% coding capabilities 65% multimodal understanding 65% mathematical reasoning 60% integrated development environment 55% model testing 50%