AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnology

Google’s New AI Browses the Web Like a Human

▼ Summary

Google has introduced Gemini 2.5 Computer Use, an AI model that navigates and interacts with web interfaces using visual understanding to perform tasks like form submissions.
– The model is designed for UI testing and navigating human-centric interfaces where APIs are unavailable, building on prior use in AI Mode and Project Mariner.
– This announcement follows OpenAI’s ChatGPT updates and Anthropic’s earlier release of a similar “computer use” AI model, highlighting competitive developments in AI agents.
– Google’s tool operates only within a browser environment, not a full computer system, and supports 13 actions such as typing and dragging elements.
– Gemini 2.5 Computer Use is accessible to developers via Google AI Studio and Vertex AI, with a public demo available on Browserbase for tasks like playing games or browsing websites.

Google has unveiled a new Gemini AI model that mimics human web browsing, enabling artificial intelligence agents to operate within interfaces originally built for people. This advanced system, named Gemini 2.5 Computer Use, interprets user instructions visually and logically, then performs actions like completing and submitting online forms. It represents a significant step toward automating interactions with websites that lack direct API access.

The technology proves especially useful for user interface testing or navigating digital environments intended for human users. Previous iterations have already powered agent-driven functions in AI Mode and Project Mariner, a research initiative where AI independently handles browser-based assignments, such as populating a shopping cart from an ingredient list.

This announcement arrives shortly after OpenAI introduced new ChatGPT applications during its annual developer conference, emphasizing its own ChatGPT Agent designed to manage intricate multi-step tasks. Separately, Anthropic launched a Claude AI model last year featuring similar computer-use functionality.

Google has released accelerated demonstration videos illustrating the tool’s capabilities, showing it executing commands at triple speed. According to the company, the model outperforms leading competitors across various web and mobile benchmarks. However, unlike alternatives from OpenAI and Anthropic, Google’s AI currently interacts solely within a browser environment rather than accessing a full computer operating system. The developers clarify that desktop OS control remains unoptimized, with support presently limited to thirteen core actions, launching a browser, entering text, and dragging items among them.

Gemini 2.5 Computer Use is now accessible to developers via Google AI Studio and Vertex AI platforms. A public demo on Browserbase allows observers to watch the AI perform specific tasks, such as playing the game 2048 or scanning Hacker News for trending discussions.

(Source: The Verge)

Topics

gemini ai 95% computer use 90% google announcement 85% web navigation 85% task automation 80% ai agents 80% browser automation 80% visual understanding 75% ui testing 75% openai chatgpt 70%