Workflow
ZeroSearch
icon
Search documents
自搜索强化学习SSRL:Agentic RL的Sim2Real时刻
机器之心· 2025-09-02 01:27
Core Insights - The article discusses the development and effectiveness of SSRL (Structured Search Reinforcement Learning) in enhancing the training efficiency and stability of Search Agents using large language models (LLMs) [6][28] - SSRL demonstrates superior performance over traditional methods that rely on external search engines, achieving effective transfer from simulation to real-world applications (Sim2Real) [6][28] Group 1 - SSRL utilizes structured prompts and format rewards to effectively extract world knowledge from models, leading to improved performance across various benchmarks and reduced hallucination [2][6] - The research highlights the high costs and inefficiencies associated with current RL training methods for Search Agents, which include full-real and semi-real search approaches [7][13] - The introduction of SSRL allows for a significant increase in training efficiency, estimated at approximately 5.6 times, while maintaining a continuous increase in training rewards without collapse [31][32] Group 2 - Experiments show that models trained with SSRL outperform those relying on external engines, particularly in real-world search scenarios, indicating the importance of integrating real-world knowledge [28][31] - The article presents findings that suggest the combination of self-generated knowledge and real-world knowledge can enhance model performance, particularly through entropy-guided search strategies [34] - The integration of SSRL with TTRL (Task-Driven Reinforcement Learning) has shown to improve generalization and effectiveness, achieving up to a 67% performance increase in certain tasks [38][39]
成本暴降88%!通义实验室、北大发布ZeroSearch,无需搜索即可激活LLM检索能力
机器之心· 2025-05-29 04:53
Core Insights - The article introduces the ZeroSearch framework, which enables large language models (LLMs) to activate their search capabilities without relying on real search engines, significantly reducing training costs by 88% while outperforming methods that depend on actual search engines [1][21]. Methodology - ZeroSearch employs a reinforcement learning (RL) framework that utilizes a simulation LLM as a search engine, eliminating the need for real-time API interactions, thus lowering training costs [4][6]. - The framework incorporates a structured training template that guides the model through each interaction, enhancing the clarity and interpretability of the reasoning process [8]. - A loss masking technique is applied to prevent the strategy model from memorizing documents generated by the simulation LLM, ensuring that only tokens generated by the strategy model are considered for loss calculation [4][8]. Training Strategy - The training process begins with a gradual increase in difficulty, allowing the model to learn basic output formats and task logic before rapidly escalating the challenge to enhance reasoning capabilities [22][36]. - A curriculum learning strategy is implemented, progressively lowering the quality of generated documents to stimulate the model's reasoning ability effectively [13][36]. Experimental Results - ZeroSearch demonstrates superior performance across various datasets, achieving an average score of 40.93 in multi-hop question answering tasks, surpassing all baseline methods [20][21]. - The framework shows robust generalization capabilities, with performance improving as model parameters increase, indicating strong scalability [23][27]. - In comparison to real search engines, ZeroSearch exhibits a significant potential to replace them in large-scale RL applications, showcasing its effectiveness in enhancing search capabilities [21][24]. Conclusion - The ZeroSearch framework effectively activates the search capabilities of LLMs without the need for real search engines, demonstrating strong adaptability and scalability across different RL algorithms [36].
通义实验室新研究:大模型自己「扮演」搜索引擎,提升推理能力无需搜索API
量子位· 2025-05-17 03:50
Core Insights - The article discusses the introduction of ZeroSearch, an open-source reinforcement learning framework developed by Alibaba's Tongyi Laboratory, which enhances the search capabilities of large language models (LLMs) without relying on real search engines [4][19]. Group 1: Challenges in Current Approaches - Current search engines produce unpredictable document quality, introducing noise and instability into the training process [2]. - Reinforcement learning (RL) training requires frequent deployments, leading to significant API costs that limit scalability [3]. Group 2: ZeroSearch Solution - ZeroSearch eliminates the need for interaction with real search engines, thus avoiding API costs and making large-scale RL training more economically feasible [19][36]. - The framework allows LLMs to become self-sufficient in search evolution through a simulated search environment and progressive noise-resistant training [6][19]. Group 3: Training Methodology - ZeroSearch employs lightweight fine-tuning to transform LLMs into "search engine simulators," enabling them to generate both useful results and noise interference with minimal labeled data [7][10]. - A curriculum-based noise training approach is introduced, where the model initially returns high-quality documents and gradually incorporates noise, enhancing training stability and effectiveness [12][14]. Group 4: Performance Metrics - Experimental results indicate that ZeroSearch requires only a 3 billion parameter LLM to significantly improve search capabilities while saving on API costs [5]. - ZeroSearch outperforms existing methods in both single-hop and multi-hop question-answering tasks, demonstrating superior retrieval capabilities [25][26]. Group 5: Compatibility with RL Algorithms - ZeroSearch is compatible with various RL algorithms, including Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), providing flexibility in training strategies [19][20]. - GRPO shows better training stability, while PPO offers higher flexibility in certain tasks, indicating that ZeroSearch can adapt to different algorithmic needs [21][34]. Group 6: Future Implications - The innovative approach of ZeroSearch addresses cost and stability issues present in current methods, paving the way for future advancements in intelligent retrieval systems [37].
颠覆谷歌搜索API,成本降至88%,阿里开源RL框架ZeroSearch,重新定义AI搜索!
AI科技大本营· 2025-05-09 09:35
Core Insights - Alibaba's Tongyi team has launched ZeroSearch, a generative search engine framework that operates independently without external search interfaces, achieving low-cost and high-performance retrieval capabilities [1][10]. Group 1: ZeroSearch Overview - ZeroSearch allows users to run a 14 billion parameter model on four A100 GPUs for just $70.80, providing search capabilities that can rival or exceed Google [1][16]. - The framework employs a novel reinforcement learning approach to train search capabilities without interacting with real search engines, addressing issues of document quality and high API costs [2][6]. Group 2: Training Methodology - The training process involves lightweight supervised fine-tuning to convert a large model into a retrieval module capable of generating relevant and irrelevant documents based on queries [8]. - A curriculum learning strategy is introduced, gradually lowering document quality to challenge the model's reasoning and retrieval abilities, thus enhancing its search learning path [2][8]. Group 3: Cost Efficiency and Performance - ZeroSearch has demonstrated an 80%-90% reduction in training costs compared to traditional methods, making it a truly low-cost and high-performance solution for AI search training [10][16]. - In various experimental scenarios, ZeroSearch has achieved performance levels that are equal to or better than models trained with real search engines, with a 7 billion parameter model matching Google search quality and a 14 billion parameter version surpassing it [15][16]. Group 4: Open Source and Accessibility - The researchers have made their code, datasets, and pre-trained models publicly available on GitHub and Hugging Face, promoting accessibility for other researchers and companies [16].
拜拜,昂贵的谷歌搜索 API!阿里开源 RL 框架让大模型自给自足、成本直降88%,网友:游戏规则变了
AI前线· 2025-05-09 05:18
Core Viewpoint - Alibaba's new technology "ZeroSearch" significantly reduces the cost and complexity of training AI systems for information retrieval, eliminating the need for expensive commercial search engine APIs [1][2][14]. Summary by Sections Technology Overview - ZeroSearch is a reinforcement learning framework that allows large language models (LLMs) to develop advanced search capabilities through simulation, outperforming models based on real search engines while incurring zero API costs [2][3]. - The technology is compatible with various model series, including Qwen-2.5 and LLaMA-3.2, and does not require a separate supervised preheating phase [2][3]. Performance Metrics - In comprehensive experiments across seven question-answer datasets, ZeroSearch's performance matched or exceeded that of models trained with real search engines [3][5]. - A 3 billion parameter LLM can achieve search capabilities comparable to Google, while a 14 billion parameter module can surpass Google's performance [3][5]. Cost Efficiency - Training using Google search via SerpAPI for approximately 64,000 queries costs around $586.70, while using a 14 billion parameter simulated LLM on four A100 GPUs costs only $70.80, representing an 88% reduction in costs [7][8]. Methodology - ZeroSearch begins with a lightweight supervised fine-tuning process that transforms LLMs into retrieval modules capable of generating relevant and irrelevant documents in response to queries [9][11]. - The system employs a course-based learning deployment mechanism, gradually increasing the difficulty of generated documents to simulate challenging retrieval scenarios [11][12]. Implications for AI Development - ZeroSearch represents a significant shift in AI training methods, enabling AI systems to improve without relying on external tools like search engines [14][15]. - This technology creates a more equitable competitive environment for small AI companies and startups by drastically lowering the entry barrier associated with high API costs [14][15].