通义实验室新研究：大模型自己「扮演」搜索引擎，提升推理能力无需搜索API

Core Insights - The article discusses the introduction of ZeroSearch, an open-source reinforcement learning framework developed by Alibaba's Tongyi Laboratory, which enhances the search capabilities of large language models (LLMs) without relying on real search engines [4][19]. Group 1: Challenges in Current Approaches - Current search engines produce unpredictable document quality, introducing noise and instability into the training process [2]. - Reinforcement learning (RL) training requires frequent deployments, leading to significant API costs that limit scalability [3]. Group 2: ZeroSearch Solution - ZeroSearch eliminates the need for interaction with real search engines, thus avoiding API costs and making large-scale RL training more economically feasible [19][36]. - The framework allows LLMs to become self-sufficient in search evolution through a simulated search environment and progressive noise-resistant training [6][19]. Group 3: Training Methodology - ZeroSearch employs lightweight fine-tuning to transform LLMs into "search engine simulators," enabling them to generate both useful results and noise interference with minimal labeled data [7][10]. - A curriculum-based noise training approach is introduced, where the model initially returns high-quality documents and gradually incorporates noise, enhancing training stability and effectiveness [12][14]. Group 4: Performance Metrics - Experimental results indicate that ZeroSearch requires only a 3 billion parameter LLM to significantly improve search capabilities while saving on API costs [5]. - ZeroSearch outperforms existing methods in both single-hop and multi-hop question-answering tasks, demonstrating superior retrieval capabilities [25][26]. Group 5: Compatibility with RL Algorithms - ZeroSearch is compatible with various RL algorithms, including Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), providing flexibility in training strategies [19][20]. - GRPO shows better training stability, while PPO offers higher flexibility in certain tasks, indicating that ZeroSearch can adapt to different algorithmic needs [21][34]. Group 6: Future Implications - The innovative approach of ZeroSearch addresses cost and stability issues present in current methods, paving the way for future advancements in intelligent retrieval systems [37].