人工智能模型自我进化
Search documents
⽆需任何监督信号!自博弈机制让深度搜索Agent实现自我进化
机器之心· 2025-11-15 01:37
Core Insights - The article discusses the rising interest in search-based agents and the challenges in enhancing their capabilities to approach human-level performance [2] - A new method called Search Self-Play (SSP) is proposed, allowing agents to evolve through self-play without the need for human annotation [5][21] - The SSP method has shown significant improvements in various open-domain question-answering benchmarks, demonstrating its effectiveness in enhancing agent capabilities [17] Method Overview - The SSP framework involves a single large language model acting as both the "Proposer" and "Solver," engaging in adversarial training to dynamically increase task difficulty as the model's capabilities improve [7][10] - The training process consists of three stages: problem generation, collaborative verification, and adversarial solving, ensuring that generated questions are solvable and unique [9][10] Experimental Results - The SSP method was evaluated across seven open-domain question-answering benchmarks, consistently outperforming baseline methods [16][17] - Notably, the Qwen2.5-7B-Base model achieved an average score increase of 26.4 points, with a remarkable 40.4-point improvement on TriviaQA [17] - The SSP approach also proved effective for instruction-tuned models, enhancing their performance by an average of 8.0 points [17] Implications and Future Directions - The SSP paradigm represents a shift towards self-competition among models, potentially leading to superhuman performance without human supervision [21][22] - The article suggests that this self-play training method could become a standard in future large model training, as it allows for rapid capability enhancement beyond the limitations of human annotation [21]