最新智能体自动操作手机电脑，10个榜单开源SOTA全拿下｜通义实验室

Core Viewpoint - The article discusses the launch of the Mobile-Agent-v3 framework by Tongyi Lab, which achieves state-of-the-art (SOTA) performance in automating tasks on mobile and desktop platforms, showcasing its ability to perform complex tasks through a multi-agent system [2][9]. Group 1: Framework and Capabilities - The Mobile-Agent-v3 framework can independently execute complex tasks with a single command and seamlessly switch roles within a multi-agent framework [3][9]. - It has achieved SOTA performance across ten major GUI benchmarks, demonstrating both foundational capabilities and reasoning generalization [9][11]. Group 2: Data Production and Model Training - The framework relies on a robust cloud infrastructure built on Alibaba Cloud, enabling large-scale parallel task execution and data collection [11][13]. - A self-evolving data production chain automates data collection and model optimization, creating a feedback loop for continuous improvement [13][15]. - The model is trained using high-quality trajectory data, which is generated through a combination of historical task data and large-scale pre-trained language models [22][23]. Group 3: Task Execution and Understanding - The framework emphasizes precise interface element localization, allowing the AI to understand the graphical interface effectively [18][19]. - It incorporates complex task planning, enabling the AI to strategize before executing tasks, enhancing its ability to handle long-term and cross-application tasks [21][22]. - The model understands the causal relationship between actions and interface changes, which is crucial for effective task execution [24][25]. Group 4: Reinforcement Learning and Performance - The Mobile-Agent team employs reinforcement learning (RL) to enhance the model's decision-making capabilities through real-time interactions [28][29]. - An innovative TRPO algorithm addresses the challenges of sparse and delayed reward signals in GUI tasks, significantly improving learning efficiency [31][36]. - The framework has shown a performance increase of nearly 8 percentage points in dynamic environments, indicating its self-evolution potential [36][40]. Group 5: Multi-Agent Collaboration - The Mobile-Agent-v3 framework supports multi-agent collaboration, allowing different agents to handle various aspects of task execution, planning, reflection, and memory [33][34]. - This collaborative approach creates a closed-loop enhancement pipeline, improving the overall efficiency and effectiveness of task execution [34][35]. - The framework's design enables AI to act with purpose, adjust based on feedback, and retain critical information for future tasks [35][36].