Workflow
openJiuwen
icon
Search documents
DeepAgent与DeepSearch双双霸榜,答案指向openJiuwen这一新兴开源项目
3 6 Ke· 2026-02-12 07:06
Core Insights - The article highlights the emergence of advanced AI agents, particularly focusing on DeepAgent and DeepSearch, which have achieved top rankings in the GAIA and BrowseComp-Plus benchmarks respectively, indicating a significant leap in AI capabilities [1][20]. Group 1: GAIA Benchmark Insights - DeepAgent, built on the openJiuwen platform, achieved a score of 91.69%, surpassing competitors like NVIDIA's Nemotron, showcasing its superior capabilities in general agent tasks [2][10]. - GAIA is a rigorous benchmark designed to evaluate AI agents on 12 core competencies, including long-term task planning and multi-modal understanding, with a scoring system that emphasizes real-world task execution [6][4]. - The average success rate for human participants in GAIA is around 92%, while leading AI models like GPT-4 only achieve about 15%, highlighting the benchmark's challenging nature [6][10]. Group 2: DeepAgent's Capabilities - DeepAgent's design allows it to dynamically adjust plans based on real-time feedback, ensuring task completion even in changing environments [12][13]. - It features a multi-layered context engine that maintains cognitive consistency and traceability throughout complex tasks, enhancing the reliability of its outputs [15]. - The agent employs an asynchronous tool orchestration system, enabling efficient and reliable execution of diverse tasks by coordinating various external tools [16][17]. Group 3: BrowseComp-Plus Benchmark Insights - DeepSearch, also based on openJiuwen, achieved an accuracy of 80% in the BrowseComp-Plus benchmark, demonstrating its strength in deep search and web interaction capabilities [20][24]. - BrowseComp-Plus evaluates agents on their ability to perform multi-hop retrieval and cross-source information integration, making it a critical measure of an agent's practical capabilities [23][24]. - The benchmark employs a fixed human-validated corpus to ensure fairness and reproducibility in its assessments, avoiding biases from real-time web dynamics [23]. Group 4: Technological Foundation - Both DeepAgent and DeepSearch leverage the openJiuwen platform, which provides a comprehensive framework for developing high-precision, high-efficiency AI agents [30][31]. - openJiuwen supports multi-agent collaboration and self-evolution, allowing agents to continuously improve their performance through a closed-loop optimization process [31][32]. - The platform has already been commercialized in various sectors, including finance and manufacturing, indicating its broad applicability and potential for future growth [31].
DeepAgent与DeepSearch双双霸榜!答案指向openJiuwen这一新兴开源项目
机器之心· 2026-02-12 05:16
Core Insights - The article highlights the emergence of advanced AI agents, particularly focusing on Clawdbot and its evolution into OpenClaw, reflecting a global desire for more sophisticated and reliable AI systems [1] - The year 2025 is referred to as the "Year of AI Agents," with numerous agents being developed and evaluated against rigorous benchmarks like GAIA and BrowseComp-Plus [1][2] - DeepAgent and DeepSearch, built on the openJiuwen platform, have achieved top rankings in the GAIA and BrowseComp-Plus benchmarks, respectively, showcasing their advanced capabilities [2][25] GAIA Benchmark Insights - DeepAgent achieved a score of 91.69%, surpassing competitors like NVIDIA's Nemotron, indicating its strong performance in general agent capabilities [4][13] - GAIA evaluates agents on 12 core abilities, including long-term task planning and multi-modal understanding, with a scoring system that emphasizes real-world task difficulty [8][10] - The average success rate for human participants in GAIA is around 92%, while leading AI models like GPT-4 perform significantly lower, highlighting the challenge faced by AI agents [9] DeepAgent's Capabilities - DeepAgent's design allows it to dynamically adjust plans based on real-time feedback, ensuring task completion even in changing environments [17] - It features a multi-layered context engine that maintains consistency and traceability in reasoning, crucial for complex tasks [19][21] - The agent's ability to execute tasks, such as analyzing YouTube cooking videos and purchasing ingredients, demonstrates its practical application in real-world scenarios [15] BrowseComp-Plus Benchmark Insights - DeepSearch achieved an accuracy of 80%, leading the BrowseComp-Plus ranking, which assesses deep search and web browsing capabilities [26][29] - The BrowseComp-Plus benchmark focuses on multi-hop retrieval and cross-source information integration, emphasizing the agent's ability to extract relevant information from vast datasets [29][30] - The scoring mechanism is designed to ensure fairness and reproducibility, using a fixed human-validated corpus to avoid biases from real-time web dynamics [30] DeepSearch's Capabilities - DeepSearch employs a multi-branch reasoning approach, allowing it to explore various potential solutions simultaneously, enhancing search efficiency [35] - It features an intelligent action exploration system that balances the depth of search with the diversity of paths taken, addressing the challenges of noise and misinformation [37][39] - The system's design mimics human expert reasoning, enabling it to adaptively prioritize search actions based on real-time evaluations [39][40] openJiuwen Platform Insights - Both DeepAgent and DeepSearch leverage the openJiuwen platform, which provides a comprehensive framework for developing high-precision, controllable AI agents [41][42] - The platform supports multi-agent collaboration and self-evolution, allowing for continuous improvement and adaptability in task execution [43] - openJiuwen has been commercialized in various sectors, including finance and manufacturing, indicating its broad applicability and potential for industry transformation [43] Conclusion - The article concludes that the AI agent landscape is at a pivotal point, distinguishing between basic language-interactive agents and advanced systems capable of planning, resource scheduling, and self-repair [46] - The success of DeepAgent and DeepSearch underscores the importance of robust architectural design in achieving high performance in stringent evaluations [46][48]