Workflow
搜索智能体
icon
Search documents
搜索范式革命:纳米AI与谷歌的「超级搜索智能体」共识
36氪· 2025-06-12 11:27
Core Viewpoint - The article discusses the evolution of search engines into "super search" intelligent agents by 2025, emphasizing their transition from traditional keyword-based searches to task-oriented engines that understand user intent and deliver actionable solutions [2][8][16]. Group 1: Evolution of Search Engines - The concept of "super search" is moving from theory to reality, with search engines evolving to possess both intent understanding and task execution capabilities [2][3]. - The AI search 1.0 era involved traditional web page ranking with AI enhancements, while AI search 2.0 transitioned to answer engines focused on delivering direct answers [5][8]. - By 2025, AI search 3.0 will enable a closed-loop system where user intent input leads to automatic execution and result delivery, fundamentally changing how users interact with search engines [8][16]. Group 2: Capabilities of Super Search - Super search must incorporate five key capabilities: task planning, multi-model collaboration, high-dimensional information recognition, multi-modal output, and personalized search experiences [9][10][11][12][13]. - Current AI search engines are still in the early stages of development, with notable examples like Nano AI and Google's AI Mode demonstrating varying degrees of these capabilities [14][18]. Group 3: Market Position and Competition - Nano AI has emerged as a leader in the AI search engine market, achieving significant user engagement and outperforming competitors like Perplexity and traditional search engines [19][21]. - The competition in the search engine space is shifting towards more open agent product designs, with companies like Google leveraging their established technology and brand, while Nano AI focuses on rapid innovation and user-centric product development [33][34].
搜索Agent最新高效推理框架:吞吐量翻3倍、延迟降至1/5,还不牺牲答案质量丨南开& UIUC研究
量子位· 2025-05-29 01:08
Core Insights - The article discusses the efficiency challenges faced by AI-driven search agents, particularly those powered by large language models (LLMs), and introduces a new framework called SearchAgent-X that significantly enhances performance [1][3][32]. Efficiency Bottlenecks - The research identifies two main efficiency bottlenecks in search agents: retrieval accuracy and retrieval latency [4][8]. - Retrieval accuracy is not a straightforward relationship; both low and high precision can negatively impact efficiency. Low precision leads to increased rounds of retrieval, while high precision consumes excessive computational resources [5][6][7]. - Search agents benefit from high recall rate approximate searches, which support reasoning without incurring unnecessary costs [7]. Latency Issues - Search agents are highly sensitive to retrieval latency, where even minor increases can lead to significant end-to-end delays, sometimes up to 83 times [11]. - Improper scheduling and retrieval stalls are identified as primary causes of latency, with data showing that up to 55.9% of tokens may be unnecessarily recomputed due to scheduling issues [13]. SearchAgent-X Framework - SearchAgent-X employs two main acceleration mechanisms: priority-aware scheduling and non-stall retrieval [14][16]. - Priority-aware scheduling dynamically prioritizes concurrent requests to minimize unnecessary waiting and redundant computations [17][18]. - Non-stall retrieval allows for flexible, non-blocking searches, enabling early termination of retrieval when results are deemed sufficient [19][20][22]. Performance Improvements - In practical tests, SearchAgent-X demonstrated a throughput increase of 1.3 to 3.4 times and reduced average latency to 20% to 60% of baseline systems [27]. - The framework maintained generation quality comparable to baseline systems, with slight improvements in accuracy observed in some datasets due to the nature of approximate retrieval [28][29]. Technical Contributions - Each optimization component contributes significantly to overall performance, with priority scheduling reducing end-to-end latency by 35.55% and improving cache hit rates [30]. - Non-stall retrieval further enhances cache hit rates and reduces latency, emphasizing the importance of minimizing waiting times in complex AI systems [31]. Future Outlook - The article concludes that future AI systems will require more frequent interactions with external tools and knowledge bases, highlighting the need to address existing efficiency bottlenecks [32][33]. - It emphasizes the importance of balancing the performance of individual tools within the overall workflow of AI agents to avoid compounding delays and inefficiencies [34].