Workflow
谷歌加入CUA战场,发布Gemini 2.5 Computer Use:让AI直接操作浏览器
机器之心·2025-10-08 03:18

Core Insights - Google DeepMind has launched the Gemini 2.5 Computer Use model, which allows AI to directly control user browsers, similar to OpenAI's Computer-Using Agent (CUA) [1][25] - The model demonstrates state-of-the-art (SOTA) performance in various benchmarks, outperforming competitors in several tasks [6][25] Benchmark Performance - Gemini 2.5 Computer Use achieved notable scores in benchmark tests, such as: - Online-Mind2Web: 69.0% accuracy - Measured by Browserbase: 65.7% accuracy - WebVoyager: 88.9% self-reported accuracy - AndroidWorld: 69.7% accuracy [7] Speed and Accuracy - The model exhibits high accuracy and speed in completing tasks, effectively gathering information and organizing notes [5][9] - However, it struggles with more complex tasks, indicating limitations in its current capabilities [9][11] User Interaction and Workflow - Users can access the model's capabilities through Google AI Studio and Vertex AI's Gemini API, with a demo environment available for testing [13] - The model operates in a loop, analyzing user inputs and generating UI action function calls, with safety mechanisms in place to confirm actions [19][21] Safety Mechanisms - Google has integrated safety measures during the training phase to mitigate risks associated with AI controlling computers, including user misuse and unexpected model behavior [23][26] - Developers are provided with options to prevent the model from executing potentially harmful actions [24][26] Industry Implications - The introduction of Gemini 2.5 Computer Use signals a competitive shift in the AI agent landscape, with major tech companies vying to redefine human-computer interaction [25]