WebShaper

Search documents
通义实验室大火的 WebAgent 续作:全开源模型方案超过GPT4.1 , 收获开源SOTA
机器之心· 2025-07-29 10:31
Core Insights - The article introduces WebShaper, a new paradigm for synthesizing training data for information-seeking (IS) tasks, achieving a state-of-the-art (SOTA) score of 60.1 on the GAIA benchmark using an open-source model [1][6][30] - WebShaper addresses the lack of high-quality training data for GAIA and Browsecomp, reflecting a deeper understanding of IS tasks from heuristic to formalized definitions [2][7] Group 1: Formalization and Methodology - WebShaper proposes a formalized model for IS tasks based on set theory, introducing the concept of Knowledge Projection (KP) to control reasoning paths and task complexity [13][14] - The formalization allows for precise control over reasoning complexity and logical structure, aligning information structure with reasoning structure to minimize errors in data synthesis [10][16] - The process begins with pre-constructed seed tasks, which are expanded into final synthesized data through a dedicated Expander module, ensuring broad coverage and task correctness [18][25] Group 2: Data Generation and Training - The article emphasizes the importance of systematically constructing high-quality training data to enhance the information retrieval capabilities of intelligent agents [9] - WebShaper's approach transitions from an "information-driven" synthesis paradigm to a "formalization-driven" one, enabling broader task coverage and knowledge generation [15][31] - The training of agents is conducted using supervised fine-tuning (SFT) combined with reinforcement learning strategies, resulting in 5,000 training trajectories and significant performance improvements on the GAIA benchmark [26][31] Group 3: Performance and Comparisons - WebShaper's performance surpasses that of closed-source models, with the highest score of 60.1 compared to 40.7 for GPT4.1 and 58.2 for Claude Sonnet4 [30] - The article highlights that the task-solving capabilities of WebShaper require more agent actions compared to baseline data, indicating a higher complexity in the tasks generated [32] Group 4: Implications and Future Directions - The formalized task synthesis approach of WebShaper can be extended to more complex tasks beyond IS, suggesting a broader application in AI research [35] - The article advocates for open-source data and models as a means to achieve high performance in AI tasks, promoting a collaborative ecosystem for advancing AI research [34]