Core Insights - Tongyi DeepResearch has achieved state-of-the-art (SOTA) performance in multiple authoritative benchmarks, surpassing overseas flagship models while providing fully open-source models, frameworks, and solutions [1][3] - The team has developed a comprehensive methodology for building DeepResearch Agents, covering the entire process from data synthesis to reinforcement learning [3][4] Data Strategy - The enhancement of model capabilities is attributed to a multi-stage data strategy designed to generate high-quality training data without relying on expensive manual annotations [4] - The team introduced Agentic CPT (incremental pre-training) to establish a solid foundation for the model, creating a systematic and scalable data synthesis plan [5] - A new process for generating complex question-answer data was developed, ensuring the authenticity of data structures through real website data extraction [9] Reasoning Modes - Tongyi DeepResearch features both a native ReAct Mode and a Heavy Mode for managing complex multi-step research tasks [10] - The ReAct Mode excels in standard operations with a context length of 128K, allowing for extensive interaction rounds [11] - The Heavy Mode utilizes the IterResearch paradigm to deconstruct tasks into research rounds, maintaining cognitive focus and high-quality reasoning [12][13] Training Innovations - The team has innovated the training process for Agent models, integrating Agentic CPT, rejected fine-tuning (RFT), and Agentic reinforcement learning (RL) [15][18] - The RL algorithm is customized based on GRPO, ensuring that learning signals align precisely with the model's current capabilities [19] - Dynamic indicators during training show significant learning effects, with rewards consistently increasing and policy entropy remaining high [21] Application Deployment - Tongyi DeepResearch has empowered various internal applications within Alibaba, including the Gaode travel agent, which integrates complex reasoning capabilities into its services [24] - A simulated training environment was created to address the high costs and inconsistencies associated with real-time web API development [26] - The team has developed a legal AI agent, Tongyi Law Rui, which provides professional legal services and is built on innovative Agentic architecture [27]
通义DeepResearch震撼发布!性能比肩OpenAI,模型、框架、方案完全开源