Google has unveiled a preview of its next-generation AI model, Gemini 2.5 Computer Use, a system that not only understands text and images but can now interact with web interfaces much like a human user. Through actions such as clicking, scrolling, typing, and dragging, the AI can complete tasks without relying on API integrations — filling out forms, submitting data, or searching for information directly on web pages. With this innovation, AI evolves beyond simply answering questions — it takes action.
According to Google, the Gemini 2.5 Computer Use model possesses advanced visual comprehension and reasoning capabilities, allowing it to observe web content and execute user commands with precision. This enables seamless interaction with interfaces like websites and web apps, even in the absence of dedicated APIs. The potential applications are wide-ranging, encompassing UI testing, workflow automation, data collection, and enterprise tool integration.
Currently, the model supports 13 core commands, including opening web pages, entering text, clicking buttons, and dragging elements. While it does not yet provide full desktop-level control, Google reports that the system outperforms comparable models in numerous web and mobile-based benchmark tests.
Gemini 2.5 Computer Use builds upon Google’s earlier research initiative, Project Mariner, which demonstrated how AI could autonomously perform complex tasks in a browser — such as automatically adding grocery items to a shopping cart based on a list of ingredients.
The new version is now fully integrated into the Gemini platform and is also accessible to developers via Google AI Studio and Vertex AI, allowing immediate experimentation and deployment.
This announcement comes on the heels of OpenAI’s Dev Day, where the company introduced its new ChatGPT app and Agent features, emphasizing autonomous, multi-step task execution. Similarly, Anthropic had previously launched its own Claude Computer Use model with computer-control capabilities.
Unlike its competitors, which allow AI to control entire operating environments, Google’s approach remains deliberately constrained to browser-level interactions — a design choice that prioritizes security and controllability. Despite these boundaries, Google claims the model “outperforms other mainstream alternatives” in real-world testing and will continue to evolve to support broader interactive and functional scenarios.
The launch of Gemini 2.5 Computer Use marks a profound shift in the trajectory of generative AI — from mere language comprehension to actionable intelligence. In this new paradigm, developers will not only be able to instruct AI to generate answers but to execute operations directly.
Standing at the intersection of human interaction and automation, Google’s vision is clear: to transform AI from a passive assistant into an active, capable, and truly “hands-on” digital agent.
Related Posts:
- Gemini is Now Free in Chrome: Here’s What It Can Do for You
- Pro vs. Free: Gemini 2.5’s Tiered AI Power
- Perplexity AI to Pay Publishers for Content in a Groundbreaking New Revenue Model
- Android Revolution: Gemini Replaces Assistant on All Devices
- Beyond OpenAI: Apple Tests Google’s Gemini in Latest iOS Beta