Gemini 2.5: Advanced Web & Android Use Now in Preview

▼ Summary
– Google has released Gemini 2.5 Computer Use in public preview, a specialized model that interacts with graphical user interfaces like browsers and websites through a looped process.
– The model analyzes user requests, screenshots, and action history to generate UI actions such as clicking, typing, scrolling, and drag/drop, which are then executed by client-side code.
– It is primarily optimized for web browsers and shows promise for mobile UI control but is not yet optimized for desktop OS-level control, outperforming competitors in benchmarks for browser control at low latency.
– Built on Gemini 2.5 Pro’s visual reasoning, this model powers Project Mariner and AI Mode’s agentic features and is used internally for UI testing and third-party workflow automation tools.
– Developers can access the model via the Gemini API in Google AI Studio and Vertex AI, with a demo available through Browserbase.
Google has launched a preview of its Gemini 2.5 Computer Use model, a specialized system designed to interact directly with web browsers and Android interfaces. This advanced tool allows developers to explore agentic capabilities within Project Mariner and AI Mode, enabling automated control over graphical user interfaces through a continuous loop process.
The model operates by receiving a user request, a screenshot of the current environment, and a log of recent actions. It analyzes this information and generates a response, typically a function call representing a specific UI interaction like clicking, typing, or scrolling. Client-side code then executes the action, after which a fresh screenshot and updated URL are sent back to the model, restarting the cycle until the assigned task is fully completed.
Supported actions extend beyond basic clicks and keystrokes. The system can navigate forward or backward in browser history, perform web searches, go to specified URLs, hover the cursor, use keyboard shortcuts, scroll pages, and even handle drag-and-drop operations.
Google illustrated the model’s practical applications with two real-world examples. In one scenario, the AI was instructed to visit a pet care signup page, extract details for pets with California residency, add them as guests in a spa CRM system, and schedule a follow-up appointment with a specialist. In another example, the AI organized a cluttered digital sticky note board by moving notes into appropriate categories.
While primarily optimized for web browsers, the model also shows strong potential for mobile UI control, as evidenced by Google’s AndroidWorld benchmark. It currently isn’t tailored for desktop operating system control but demonstrates leading performance in browser automation with minimal latency, outperforming comparable offerings from Claude and OpenAI in web and mobile control benchmarks.
Built on the visual comprehension and reasoning foundation of Gemini 2.5 Pro, this model forms the core of Project Mariner and powers the agentic features in AI Mode. Internally, Google has used it to accelerate UI testing and software development. An early access program is now available for third-party developers focused on building intelligent assistants and workflow automation tools.
The Gemini 2.5 Computer Use model is accessible in public preview through the Gemini API on Google AI Studio and Vertex AI. Developers can experiment with the technology in a demo environment hosted by Browserbase.
(Source: 9to5 Google)