Core Insights - The core issue with current network search agents is not the model parameters but the lack of sufficiently challenging training data [1][5][6] - The proposed method, WebExplorer, aims to create high-quality QA pairs that enable smaller models to outperform larger ones in complex search tasks [1][8][19] Group 1: Training Data Quality - High-quality training data is scarce, which limits the performance of existing open-source network agents in complex search tasks [5][6] - The development of high-capacity network search agents fundamentally relies on improving the quality of training data [6][19] Group 2: WebExplorer Methodology - WebExplorer employs a two-stage approach: Model-Based Exploration and Iterative Query Evolution, to create challenging QA pairs [8][10] - The first stage allows the model to autonomously explore the information space, while the second stage increases query difficulty by removing clear clues and introducing strategic ambiguity [10][12] Group 3: Performance and Results - The WebExplorer-8B model, trained using the new QA dataset, supports long-horizon reasoning with a context length of 128K and up to 100 tool calls, achieving state-of-the-art performance among models of similar size [3][16] - The model demonstrated significant performance improvements, with accuracy dropping from 86.6% to 67.1% for strong commercial models, indicating the effectiveness of the evolutionary process in creating complex queries [15][19] Group 4: Generalization and Application - WebExplorer's QA pair synthesis method shows effective generalization across different benchmarks and domains, even outside STEM fields [19] - The approach highlights the potential for smaller models to excel in complex tasks through carefully designed data synthesis methods and training strategies, which is crucial for AI applications in resource-constrained environments [19]
100轮工具调用,8B小模型也能做复杂长搜索,MiniMax&港科大最新开源