Workflow
搜索Agent
icon
Search documents
让搜索Agent不「傻等」:人大团队依托扩散模型实现「一心二用」,边等搜索结果边思考,加速15%性能不减
量子位· 2026-03-02 09:09
Core Viewpoint - The article discusses the limitations of traditional search agents and introduces the concept of diffusion large language models (dLLM) as a potential solution to enhance search efficiency by allowing parallel reasoning and action during the search process [1][8][28]. Group 1: Limitations of Traditional Search Agents - Traditional search agents operate in a strictly serial manner, leading to delays as users wait for results before continuing their thought process [8]. - The current frameworks, such as ReAct, result in significant end-to-end time consumption due to this serial waiting [9]. - The article highlights that self-regressive models cannot perform parallel reasoning, which limits their efficiency in search tasks [10][16]. Group 2: Introduction of Diffusion Large Language Models (dLLM) - The dLLM can perform "dual-tasking," allowing it to think about the next steps while waiting for search results [5][11]. - Unlike traditional models, dLLMs utilize a non-sequential token generation process, enabling them to generate important parts of the output first [12][13]. - Initial tests of dLLMs as search agents showed poor performance, indicating that while they have potential, they require further training to be effective [14][16]. Group 3: Training Methodology for dLLM - The training process consists of two phases: Agentic SFT (Supervised Fine-Tuning) and Agentic VRPO (Variance-Reduced Preference Optimization) [18][20]. - The first phase involves generating high-quality search trajectories and ensuring the model learns to generate thoughts and tool calls without seeing the search results [19]. - The second phase focuses on refining the model's reasoning paths through preference learning, improving accuracy across various datasets [20]. Group 4: P-ReAct for Enhanced Efficiency - P-ReAct is introduced as a method to accelerate reasoning and tool calling without additional training [21][22]. - This method involves pre-filling boundary markers and adjusting confidence scores for tool calling areas, allowing the model to prioritize these calls [23][24]. - The implementation of P-ReAct resulted in significant improvements in response times and accuracy, demonstrating the effectiveness of the dLLM in search tasks [25][26]. Group 5: Performance and Implications - The dLLM-Searcher achieved an average accuracy of 57.0% on multiple benchmark datasets, surpassing traditional methods and showing strong generalization capabilities [25][27]. - The results indicate that dLLMs can match or exceed the reasoning capabilities of self-regressive models while leveraging their unique structural advantages [28]. - This advancement opens new avenues for optimizing search agent efficiency, suggesting a shift in how search tasks may be approached in the future [29].