通义DeepResearch震撼发布！性能比肩OpenAI，模型、框架、方案完全开源

Core Insights - The article discusses the advancements of Tongyi DeepResearch, highlighting its transition from basic conversational capabilities to sophisticated research functionalities, achieving state-of-the-art (SOTA) results across multiple benchmarks while being fully open-source [1][3]. Data Strategy - The improvement in model capabilities is attributed to a multi-stage data strategy designed to generate high-quality training data without relying on expensive manual annotations [5]. - The team introduced Agentic Continual Pre-training (CPT) to establish a solid foundation for the model, utilizing a systematic and scalable data synthesis approach [6]. - The data generation process involves restructuring and constructing questions based on a wide array of knowledge documents, web crawler data, and knowledge graphs, creating an open-world knowledge memory anchored by entities [6]. Reasoning Modes - Tongyi DeepResearch features both a native ReAct Mode and a Heavy Mode for managing complex multi-step research tasks [11]. - In ReAct Mode, the model excels in a standard thinking-action-observation cycle, supporting extensive interaction rounds with a context length of 128K [12]. - Heavy Mode employs a new IterResearch paradigm to deconstruct tasks into research rounds, allowing the agent to maintain cognitive focus and high-quality reasoning [13][14]. Training Methodology - The training process integrates Agentic CPT, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL), establishing a new paradigm for agent model training [17][20]. - The team customized RL algorithms based on GRPO, ensuring that learning signals align with the model's current capabilities, and implemented strategies to enhance training stability [21]. - Dynamic indicators during training show significant learning effects, with rewards consistently increasing, indicating effective exploration and adaptation [23]. Application Deployment - Tongyi DeepResearch has empowered various internal applications within Alibaba, including the creation of a simulated training environment to reduce development costs and improve speed [27]. - The team developed a stable and efficient tool sandbox to ensure reliable tool calls during agent training and evaluation [27]. - The collaboration with Gaode App focuses on enhancing complex query experiences in navigation and local services, showcasing the practical application of agent capabilities [28]. Legal Intelligence - Tongyi Falvui serves as a legal intelligence agent, providing professional legal services such as legal Q&A, case law retrieval, and document drafting, leveraging innovative agent architecture [30]. - The performance metrics of Tongyi Falvui indicate superior quality in answer points, case citations, and legal references compared to other models [31]. Research Contributions - The Tongyi DeepResearch team has consistently published technical reports, contributing to the open-source community and advancing the field of deep research agents [33].