Long - horizon planning

Search documents
312条轨迹激发241%性能!上交大与SII开源电脑智能体,超越 Claude 3.7
机器之心· 2025-05-25 03:51
Core Insights - The article discusses the advancements in computer agents, particularly highlighting the performance improvements achieved by using a minimal amount of human-annotated data, specifically 312 human operation trajectories, to train the PC Agent-E model, which surpassed previous models in performance [1][3][10]. Group 1: Model Development - The research indicates that current large models possess the foundational capabilities to complete tasks using computers, with performance bottlenecks primarily related to long-horizon planning, which can be significantly enhanced with a small number of high-quality trajectories [3][13]. - The team utilized a tool called PC Tracker to collect 312 human operation trajectories, which included task descriptions, screenshots, and keyboard/mouse operations, ensuring data accuracy [4][10]. - The PC Agent-E model was trained on the open-source model Qwen2.5-VL-72B, achieving a performance increase of 241% compared to its initial state, demonstrating high sample efficiency [10][11]. Group 2: Methodology Innovations - A key innovation in the research is the "Thought Completion" process, which adds reasoning behind each action taken by humans, thereby enhancing the quality of the training data [7][8]. - The "Trajectory Boost" method was introduced to synthesize additional action decisions for each step in the trajectory, capturing the inherent diversity of possible actions for computer tasks, which significantly enriched the training data [8][11]. - The results showed that as the number of synthesized actions increased, model performance improved significantly, validating the effectiveness of the trajectory enhancement method [11][12]. Group 3: Performance Evaluation - PC Agent-E was evaluated on the WindowsAgentArena-V2, outperforming the Claude 3.7 Sonnet's extended thinking mode, marking it as the new state-of-the-art (SOTA) for open-source computer agents on Windows systems [10][11]. - The research concluded that a small number of high-quality trajectories can effectively stimulate a powerful long-horizon planning capability in agents, reducing the need for vast amounts of human-annotated data [13].