深度搜索
Search documents
DeepAgent与DeepSearch双双霸榜!答案指向openJiuwen这一新兴开源项目
机器之心· 2026-02-12 05:16
Core Insights - The article highlights the emergence of advanced AI agents, particularly focusing on Clawdbot and its evolution into OpenClaw, reflecting a global desire for more sophisticated and reliable AI systems [1] - The year 2025 is referred to as the "Year of AI Agents," with numerous agents being developed and evaluated against rigorous benchmarks like GAIA and BrowseComp-Plus [1][2] - DeepAgent and DeepSearch, built on the openJiuwen platform, have achieved top rankings in the GAIA and BrowseComp-Plus benchmarks, respectively, showcasing their advanced capabilities [2][25] GAIA Benchmark Insights - DeepAgent achieved a score of 91.69%, surpassing competitors like NVIDIA's Nemotron, indicating its strong performance in general agent capabilities [4][13] - GAIA evaluates agents on 12 core abilities, including long-term task planning and multi-modal understanding, with a scoring system that emphasizes real-world task difficulty [8][10] - The average success rate for human participants in GAIA is around 92%, while leading AI models like GPT-4 perform significantly lower, highlighting the challenge faced by AI agents [9] DeepAgent's Capabilities - DeepAgent's design allows it to dynamically adjust plans based on real-time feedback, ensuring task completion even in changing environments [17] - It features a multi-layered context engine that maintains consistency and traceability in reasoning, crucial for complex tasks [19][21] - The agent's ability to execute tasks, such as analyzing YouTube cooking videos and purchasing ingredients, demonstrates its practical application in real-world scenarios [15] BrowseComp-Plus Benchmark Insights - DeepSearch achieved an accuracy of 80%, leading the BrowseComp-Plus ranking, which assesses deep search and web browsing capabilities [26][29] - The BrowseComp-Plus benchmark focuses on multi-hop retrieval and cross-source information integration, emphasizing the agent's ability to extract relevant information from vast datasets [29][30] - The scoring mechanism is designed to ensure fairness and reproducibility, using a fixed human-validated corpus to avoid biases from real-time web dynamics [30] DeepSearch's Capabilities - DeepSearch employs a multi-branch reasoning approach, allowing it to explore various potential solutions simultaneously, enhancing search efficiency [35] - It features an intelligent action exploration system that balances the depth of search with the diversity of paths taken, addressing the challenges of noise and misinformation [37][39] - The system's design mimics human expert reasoning, enabling it to adaptively prioritize search actions based on real-time evaluations [39][40] openJiuwen Platform Insights - Both DeepAgent and DeepSearch leverage the openJiuwen platform, which provides a comprehensive framework for developing high-precision, controllable AI agents [41][42] - The platform supports multi-agent collaboration and self-evolution, allowing for continuous improvement and adaptability in task execution [43] - openJiuwen has been commercialized in various sectors, including finance and manufacturing, indicating its broad applicability and potential for industry transformation [43] Conclusion - The article concludes that the AI agent landscape is at a pivotal point, distinguishing between basic language-interactive agents and advanced systems capable of planning, resource scheduling, and self-repair [46] - The success of DeepAgent and DeepSearch underscores the importance of robust architectural design in achieving high performance in stringent evaluations [46][48]
登顶SuperCLUE DeepSearch,openPangu-R-72B深度搜索能力跃升
机器之心· 2025-12-05 10:17
Core Insights - The article highlights the rapid development of large model inference and agent tool capabilities, with a focus on the recent SuperCLUE DeepSearch evaluation report, where the domestic model openPangu-R-72B ranked first in complex information retrieval tasks, showcasing the strength of domestic Ascend computing power in large model development [1][15]. Model Performance - In the SuperCLUE DeepSearch evaluation, openPangu-R-72B achieved a score of 73.33, outperforming other models such as Gemini-3-Pro-Preview and GPT-5.1(high), which scored 70.48 [2]. - The model excelled in various task categories, particularly in humanities and social sciences (75.47) and natural sciences (83.33) [2]. Technical Architecture - openPangu-R-72B is based on a redesigned architecture that balances efficiency and performance, utilizing a mixture of experts (MoE) model with an 80 out of 8 expert selection mechanism, maintaining 15 billion active parameters from a total of 74 billion [4]. - The model was trained on 24 trillion tokens and can handle long sequences of up to 128k, which is crucial for deep search tasks [4]. Optimization Techniques - The model incorporates several optimizations, including the introduction of parameterized Sink Token technology to stabilize training and enhance quantization compatibility [7]. - It employs a combination of K-Norm and Depth-Scaled Sandwich-Norm architectures to reduce computational overhead while maintaining stability and flexibility in expression [7]. - The attention architecture has been optimized for precision and efficiency, achieving a 37.5% reduction in KV cache while enhancing the model's ability to capture fine-grained semantic relationships [7][8]. DeepSearch Capabilities - The model's success in deep search tasks is attributed to three key strategies: long-chain question answering synthesis, non-indexed information processing, and a fast-slow thinking integration approach [10]. - The long-chain QA synthesis improved the average difficulty of questions by 10% and introduced a verification agent to enhance training accuracy [12]. - The model's workflow includes a cycle of focusing on key URLs, crawling, and document QA to gather deep information beyond traditional search engine capabilities [12]. Domestic Computing Power - The achievement of openPangu-R-72B in the SuperCLUE DeepSearch evaluation underscores the effective integration of domestic computing power with large model research and development [15]. - The model's sibling, openPangu-718B, also performed well, securing the second position in the general ranking, indicating the comprehensive capabilities of the openPangu series across different task scenarios [15].
高搜商的AI,都快学会抢答了
Sou Hu Cai Jing· 2025-05-12 06:04
Group 1 - The article discusses the importance of selecting appropriate gifts for Mother's Day, highlighting how gift choices reflect the understanding and care for mothers [2][5] - Various AI models were evaluated based on their ability to provide thoughtful gift suggestions, categorized into three main types: "advertising type," "perfunctory type," and "thoughtful type" [5][9][18] - Quark AI stands out among competitors due to its advanced deep search capabilities, allowing it to analyze user queries more effectively and provide tailored recommendations [28][34][63] Group 2 - The "advertising type" AI, represented by Kimi, failed to understand user needs and provided generic product suggestions unrelated to the mother's interests [6][9] - The "perfunctory type," represented by Doubao, offered vague and simplistic suggestions that did not align with the specific context of Mother's Day gifting [9][18] - The "thoughtful type," represented by Xunfei Xinghuo and Tencent Yuanbao, provided well-considered recommendations that included relevant gifts and additional advice to enhance the mother's happiness [18][34] Group 3 - Quark's deep search feature allows it to break down user questions into specific components, leading to more precise and relevant answers compared to other AI applications [28][34][66] - Quark's ability to anticipate follow-up questions enhances user experience, making it a more interactive and helpful tool [34][44] - The article emphasizes that Quark's deep search capabilities are supported by reliable information sources, ensuring the accuracy and credibility of its responses [66][67]
阿里夸克深度搜索:让AI更懂普通人的每一次需求本质
Tai Mei Ti A P P· 2025-05-12 00:41
Core Insights - The article discusses the evolution of AI applications, particularly focusing on Alibaba's Quark and its new "Deep Search" feature, which aims to address the limitations of traditional search methods and enhance user experience through deep reasoning and intelligent retrieval [2][3][4]. Group 1: AI Application Development - Quark has launched the first "Deep Search" product in China, targeting a user base of over 100 million, aiming to solve complex problems through advanced AI capabilities [2][4]. - The transition from traditional search to AI-driven search signifies a new era in the search industry, where companies must redefine their positions to meet evolving user needs [3][4]. Group 2: Deep Search Functionality - Deep Search utilizes Alibaba's self-developed reasoning model to analyze user intent deeply, moving beyond keyword matching to provide comprehensive answers to complex queries [4][5]. - The functionality of Deep Search allows for multi-modal interactions and proactive analysis of user needs, enhancing the clarity and logic of the responses provided [4][7]. Group 3: User Experience and Innovation - Quark's approach of "think first, then search" enables it to deliver more in-depth and comprehensive search results, reducing decision-making costs for users [7][10]. - The introduction of features like "photo intelligent processing" and the ability to create artistic images from ordinary photos showcases Quark's commitment to enhancing user experience and expanding AI capabilities [7][10]. Group 4: Market Position and Future Prospects - Quark is positioned to become a national-level "AI super entrance" as it continues to innovate and respond to user demands, evidenced by its recent success in app store rankings [9][10]. - Future developments include the launch of "Deep Search Pro," which promises enhanced analytical capabilities and the ability to provide professional reports, further solidifying Quark's competitive edge in the AI search market [9][10].