Core Insights - The article discusses the transition from atomic task automation to complex long-range task management in mobile agents, highlighting the challenges faced by current systems in handling composite tasks that require multi-application interaction and information synthesis [4][6][10]. Group 1: Current State of Mobile Agents - Multi-modal large models (MLLM) have shown promising results in single-screen actions and short-chain tasks, indicating initial maturity in edge task automation [4]. - Existing mobile GUI agents exhibit significant capability gaps when faced with complex long-range tasks, struggling with generalization from atomic to composite tasks [6][10]. Group 2: Proposed Solutions - Researchers introduced a dynamic evaluation benchmark called UI-Nexus, which covers complex long-range tasks across 50 applications, designed with 100 task templates averaging 14.05 optimal steps [7][21]. - The multi-agent task scheduling system, AGENT-NEXUS, was proposed to facilitate instruction distribution, information transfer, and process management without modifying the underlying agent models [7][19]. Group 3: Task Complexity and Types - The article categorizes composite tasks into three types based on subtask dependencies: Independent Combination, Context Transition, and Deep Dive, each presenting unique challenges for mobile agents [11][13][21]. - A detailed analysis of error cases revealed that mobile agents often fail due to poor progress management and information handling, leading to issues like context overflow and information transfer failures [16][32]. Group 4: Experimental Findings - Testing across various mobile agents showed that task completion rates were below 50%, with AGENT-NEXUS improving completion rates by 24% to 40% while only increasing inference costs by about 8% [27][30]. - The performance of agents improved significantly when given manually split atomic instructions, particularly for UI-TARS, which increased its completion rate from 11% to 60% [29]. Group 5: Future Outlook - The article envisions a new generation of AI operating systems capable of efficiently coordinating and managing complex task demands, transforming mobile devices into intelligent personal assistants [34][36].
手机AGI助手还有多远?移动智能体复合长程任务测试基准与调度系统发布
机器之心·2025-07-26 09:32