Core Insights - Google DeepMind has launched the Gemini 2.5 Computer Use model, which allows AI to directly control user browsers, similar to OpenAI's Computer-Using Agent (CUA) [1][16][17] Performance and Capabilities - Gemini 2.5 Computer Use has achieved state-of-the-art (SOTA) performance across various benchmarks, outperforming competitors like Claude Sonnet and OpenAI's CUA in several categories [5][16] - The model's accuracy in completing simple tasks is high, but it struggles with more complex tasks, indicating room for improvement [8][16] User Interaction and Workflow - The model operates through a loop process that includes user requests, environment screenshots, and action history, generating UI action function calls as responses [11][13] - Users can test the model in a demo environment hosted by Browserbase, with limitations on session duration and user intervention [8] Safety Mechanisms - Google has integrated safety mechanisms into the model to address risks associated with direct computer control, including malicious use and unintended actions [14][15] - Developers are provided with safety control options to prevent the model from executing potentially harmful operations [15] Industry Context - The introduction of Gemini 2.5 Computer Use signifies a competitive shift in the AI agent landscape, with major tech companies racing to redefine human-computer interaction [16][17]
谷歌加入CUA战场,发布Gemini 2.5 Computer Use:让AI直接操作浏览器