多跳推理
Search documents
大厂数据护城河打破!上交全开源Search Agent OpenSeeker登场
机器之心· 2026-03-31 12:19
Core Insights - OpenSeeker, developed by a research team from Shanghai Jiao Tong University, is the first fully open-source deep search agent with complete training data, breaking the data monopoly held by large companies [2][28]. - The model demonstrates that high-quality data synthesis can achieve state-of-the-art (SOTA) performance without relying on extensive computational resources [2][28]. Group 1: Model Development - OpenSeeker utilizes a unique high-quality data synthesis approach to overcome the data bottleneck typically faced by large enterprises [6][28]. - The model requires only 11.7k synthetic samples for a single round of supervised fine-tuning (SFT) to achieve competitive results on various benchmarks [17][28]. Group 2: Training Methodology - The training of deep search agents hinges on two critical aspects: creating challenging question-answer tasks and generating high-quality solution trajectories [7][8]. - OpenSeeker employs a fact-based question construction method using real web structures to ensure the model engages in genuine multi-hop reasoning [9][10][11]. - A dynamic denoising trajectory synthesis method is introduced to enhance core information extraction in noisy environments [12][15]. Group 3: Performance Metrics - OpenSeeker achieved a score of 48.4% on the BrowseComp-ZH leaderboard, surpassing Alibaba's Tongyi DeepResearch, which scored 46.7% after extensive training [17][18]. - The model's performance across multiple benchmarks includes 29.5 on BrowseComp, 48.4 on BrowseComp-ZH, 74.0 on xbench, and 59.4 on WideSearch [18]. Group 4: Data Quality and Challenges - The synthetic data generated by OpenSeeker presents a significantly higher difficulty level compared to existing benchmarks, with an average of 46.35 tool calls per trajectory and an average token length of 76.1k [25][20]. - In controlled data volume comparisons, OpenSeeker's data quality is notably superior to that of Alibaba's models, maintaining a significant advantage across various metrics [20][21]. Group 5: Community Impact - The open-source release of OpenSeeker is seen as a pivotal moment for advancing the field, providing researchers with a solid foundation for exploring next-generation search agents [24][28]. - The community response highlights the importance of data transparency and the ability to innovate without the constraints of data gatekeeping [26][29].