从专家到通才策略 - filings, earnings calls, financial reports, news

从专家到通才策略

Search documents

机器之心· 2025-08-17 04:28

Core Viewpoint - The development of Current Computer Using Agents (CUA) is heavily reliant on expensive human-annotated data, which limits their application in novel or specialized software environments. To overcome this limitation, researchers from Shanghai Jiao Tong University and The Chinese University of Hong Kong proposed SEAgent, a new framework that allows agents to learn and evolve autonomously through interaction with their environment without human intervention [2][4]. Group 1: SEAgent Framework - SEAgent's core innovation lies in its closed-loop autonomous evolution framework, a deeply optimized evaluation model, and an efficient "specialist-generalist" integration strategy [2][5]. - The autonomous evolution capability of SEAgent is derived from the collaborative functioning of three core components, forming a sustainable and self-driven learning loop [5]. Group 2: Core Components - The Curriculum Generator acts as a "mentor," automatically generating progressively challenging exploration tasks based on the agent's current capabilities and maintaining a "software guide" to document new functionalities discovered during exploration [9]. - The Actor-CUA, which is the agent itself, executes the tasks generated by the Curriculum Generator in the software environment [9]. - The World State Model serves as the "judge," evaluating the agent's performance at each step and providing critical feedback signals for learning, thus completing the evolution loop [9][10]. Group 3: Evaluation Model - A precise "judge" is fundamental to autonomous evolution. Existing open-source large visual language models struggle with evaluating long sequences of agent operations, leading to decreased accuracy with excessive historical inputs. To address this, a more robust evaluation model, the World State Model, was developed [10]. - The optimized World State Model significantly reduces the performance gap with commercial models like GPT-4o, providing reliable and stable evaluation capabilities for the SEAgent framework [10]. Group 4: Specialist-to-Generalist Strategy - The research explores building a "generalist" model capable of operating across multiple software environments, finding that training a generalist directly in multi-software settings is less effective than training specialist models in single software environments [13]. - A three-step efficient "specialist-to-generalist" integration strategy is proposed, which includes innovating the evaluation paradigm, high-quality data distillation, and cultivating specialists before transitioning to a generalist model [14][15]. Group 5: Experimental Results - The final "generalist" agent achieved an overall success rate of 34.5%, surpassing the performance of directly trained generalist models (30.6%) and exceeding the combined performance of all specialist models (32.2%), demonstrating the potential of the "specialist first, then generalist" approach [18]. - Rigorous ablation experiments confirm the necessity of the algorithm design, showing that a high-quality World State Model is essential for effective learning, and exploration-based reinforcement learning (GRPO) significantly outperforms mere imitation [20]. Group 6: Author and Research Interests - The first author of the study is Sun Zeyi, a joint doctoral student from Shanghai Jiao Tong University and the Shanghai Artificial Intelligence Laboratory, with multiple publications in CVPR, ICCV, and NeurIPS, and research interests in GUI-Agent, multimodal learning, and reinforcement learning [20].