Core Insights - The article introduces the ZeroSearch framework, which enables large language models (LLMs) to activate their search capabilities without relying on real search engines, significantly reducing training costs by 88% while outperforming methods that depend on actual search engines [1][21]. Methodology - ZeroSearch employs a reinforcement learning (RL) framework that utilizes a simulation LLM as a search engine, eliminating the need for real-time API interactions, thus lowering training costs [4][6]. - The framework incorporates a structured training template that guides the model through each interaction, enhancing the clarity and interpretability of the reasoning process [8]. - A loss masking technique is applied to prevent the strategy model from memorizing documents generated by the simulation LLM, ensuring that only tokens generated by the strategy model are considered for loss calculation [4][8]. Training Strategy - The training process begins with a gradual increase in difficulty, allowing the model to learn basic output formats and task logic before rapidly escalating the challenge to enhance reasoning capabilities [22][36]. - A curriculum learning strategy is implemented, progressively lowering the quality of generated documents to stimulate the model's reasoning ability effectively [13][36]. Experimental Results - ZeroSearch demonstrates superior performance across various datasets, achieving an average score of 40.93 in multi-hop question answering tasks, surpassing all baseline methods [20][21]. - The framework shows robust generalization capabilities, with performance improving as model parameters increase, indicating strong scalability [23][27]. - In comparison to real search engines, ZeroSearch exhibits a significant potential to replace them in large-scale RL applications, showcasing its effectiveness in enhancing search capabilities [21][24]. Conclusion - The ZeroSearch framework effectively activates the search capabilities of LLMs without the need for real search engines, demonstrating strong adaptability and scalability across different RL algorithms [36].
成本暴降88%!通义实验室、北大发布ZeroSearch,无需搜索即可激活LLM检索能力
机器之心·2025-05-29 04:53