Agentic Deep Research
Search documents
Agentic Deep Research新范式,推理能力再突破,可信度增加,蚂蚁安全团队出品
机器之心· 2025-08-27 08:36
Core Viewpoint - The article discusses the limitations of current LLMs in complex tasks and introduces the Agentic Deep Research system, which aims to enhance AI capabilities through autonomous reasoning and information integration [2][4]. Summary by Sections Introduction to Agentic Deep Research - The Agentic Deep Research system leverages LLMs to autonomously reason, utilize search engines, and integrate information iteratively to provide comprehensive and accurate solutions [2]. Limitations of Current Systems - Two main limitations are identified: Gradients Conflicts, where incorrect final answers penalize the entire reasoning process, and Reward Sparsity, which limits feedback to sparse signals based solely on final answers [4]. Atom-Searcher Framework - The Atom-Searcher framework combines supervised fine-tuning (SFT) with fine-grained reward-based reinforcement learning to enhance the Agentic Deep Research system [8]. - It introduces the Atomic Thought reasoning paradigm, which breaks down reasoning into finer functional units, improving the clarity and depth of the reasoning process [12]. Atomic Thought Reward Construction - The Atomic Thought framework reduces redundancy in reasoning outputs and provides clear supervision anchors for the Reasoning Reward Model (RRM), leading to fine-grained Atomic Thought Rewards (ATR) [13]. Reward Aggregation Strategy - A course-learning-inspired reward aggregation strategy is proposed to alleviate gradient conflicts by combining ATR with outcome-based rewards, ensuring dynamic alignment with training progress [14]. Reinforcement Learning Training - The training employs a mixed reward approach using the GRPO algorithm, with a Loss Masking strategy to maintain stability by excluding non-trainable tokens from loss calculations [15]. Experimental Results - Atom-Searcher shows significant performance improvements over the baseline DeepResearcher, achieving an 8.5% increase in In-Domain benchmarks and a 2.5% increase in Out-of-Domain benchmarks [17][18]. Ablation Studies - The contribution of the Atomic Thought paradigm and ATR is validated, demonstrating their effectiveness in providing supervision and enhancing performance compared to traditional reasoning methods [19]. Case Analysis - A comparative analysis illustrates Atom-Searcher’s advantages, such as generating Atomic Thoughts that reflect human-like cognitive behavior and triggering more search calls for richer external information [20].