DeepAgent与DeepSearch双双霸榜,答案指向openJiuwen这一新兴开源项目
3 6 Ke·2026-02-12 07:06

Core Insights - The article highlights the emergence of advanced AI agents, particularly focusing on DeepAgent and DeepSearch, which have achieved top rankings in the GAIA and BrowseComp-Plus benchmarks respectively, indicating a significant leap in AI capabilities [1][20]. Group 1: GAIA Benchmark Insights - DeepAgent, built on the openJiuwen platform, achieved a score of 91.69%, surpassing competitors like NVIDIA's Nemotron, showcasing its superior capabilities in general agent tasks [2][10]. - GAIA is a rigorous benchmark designed to evaluate AI agents on 12 core competencies, including long-term task planning and multi-modal understanding, with a scoring system that emphasizes real-world task execution [6][4]. - The average success rate for human participants in GAIA is around 92%, while leading AI models like GPT-4 only achieve about 15%, highlighting the benchmark's challenging nature [6][10]. Group 2: DeepAgent's Capabilities - DeepAgent's design allows it to dynamically adjust plans based on real-time feedback, ensuring task completion even in changing environments [12][13]. - It features a multi-layered context engine that maintains cognitive consistency and traceability throughout complex tasks, enhancing the reliability of its outputs [15]. - The agent employs an asynchronous tool orchestration system, enabling efficient and reliable execution of diverse tasks by coordinating various external tools [16][17]. Group 3: BrowseComp-Plus Benchmark Insights - DeepSearch, also based on openJiuwen, achieved an accuracy of 80% in the BrowseComp-Plus benchmark, demonstrating its strength in deep search and web interaction capabilities [20][24]. - BrowseComp-Plus evaluates agents on their ability to perform multi-hop retrieval and cross-source information integration, making it a critical measure of an agent's practical capabilities [23][24]. - The benchmark employs a fixed human-validated corpus to ensure fairness and reproducibility in its assessments, avoiding biases from real-time web dynamics [23]. Group 4: Technological Foundation - Both DeepAgent and DeepSearch leverage the openJiuwen platform, which provides a comprehensive framework for developing high-precision, high-efficiency AI agents [30][31]. - openJiuwen supports multi-agent collaboration and self-evolution, allowing agents to continuously improve their performance through a closed-loop optimization process [31][32]. - The platform has already been commercialized in various sectors, including finance and manufacturing, indicating its broad applicability and potential for future growth [31].

DeepAgent与DeepSearch双双霸榜,答案指向openJiuwen这一新兴开源项目 - Reportify