Core Insights - Tongyi's first deep research agent model, DeepResearch, has been officially open-sourced, featuring a parameter size of only 30 billion (with 3 billion activated), achieving state-of-the-art (SOTA) results across multiple authoritative evaluation sets, surpassing many top agent models [1][5] Model Training - The Tongyi team has developed a complete training pipeline driven by synthetic data, integrating pre-training and post-training phases. This model capability is based on a multi-stage data strategy aimed at creating vast amounts of high-quality training data without relying on expensive manual annotations [3] - The training pipeline is optimized based on the Qwen3-30B-A3B model, incorporating innovative reinforcement learning (RL) algorithms for validation and real training, enhancing model efficiency and robustness. The use of asynchronous reinforcement learning algorithms and automated data curation processes significantly boosts the model's iteration speed and generalization ability [3] Model Performance - The DeepResearch model, with 3 billion activated parameters, performs comparably to flagship models such as OpenAI's o3, DeepSeek V3.1, and Claude-4-Sonnet in various authoritative agent evaluation sets, including Humanity's Last Exam (HLE), BrowseComp, and GAIA [5] Model Applications - The model has been applied in various real-world scenarios, such as the "Xiao Gao Teacher" developed in collaboration with Amap, which acts as an AI co-pilot for complex travel planning tasks. Additionally, Tongyi's legal research agent, empowered by the DeepResearch architecture, can autonomously execute complex multi-step research tasks, simulating the workflow of a junior lawyer [7] DeepResearch Agent Series - Tongyi DeepResearch also boasts a rich family of DeepResearch Agent models. Earlier this year, the team has continuously expanded its DeepResearch offerings, with previously open-sourced models like WebWalker, WebDancer, and WebSailor achieving industry-leading results in agent synthetic data and reinforcement learning [9]
通义DeepResearch重磅开源
Shang Hai Zheng Quan Bao·2025-09-18 05:10