为什么Agent总是Demo猛如龙实战一条虫？

Core Viewpoint - The article discusses the limitations of AI agents in real-world applications compared to their impressive demonstrations, emphasizing that adaptability is a key factor for improvement [1]. Summary by Sections Definition and Functionality of Agents - Agents are defined as AI systems that can plan, utilize tools (such as search engines and databases), and remember information to complete complex tasks independently [3]. Adaptability Framework - The core bottleneck in current agent systems is adaptability, specifically how models adjust their behavior based on feedback signals [6]. - A 2x2 classification framework is proposed to categorize existing adaptation methods into four paradigms based on two dimensions: who is optimized (the agent or the tools) and where the feedback signal comes from (tool execution results or agent output evaluations) [7][8][9]. Four Paradigms of Adaptation - A1 Paradigm: Agents learn from feedback based on tool execution, such as whether code runs successfully [10]. - A2 Paradigm: Uses the agent's final output as the optimization signal, exemplified by models like DeepSeek-R1 that train reasoning capabilities through reinforcement learning [11]. - T1 Paradigm: Tools are pre-trained independently and then called by the agent, allowing for plug-and-play functionality [12]. - T2 Paradigm: Tools optimize themselves based on the agent's output, creating a symbiotic relationship [13]. Benefits of Classification - This classification helps developers avoid trial and error when improving AI capabilities, allowing for targeted adaptations based on specific needs [15]. - It also clarifies trade-offs: modifying AI (A1/A2) is flexible but costly, while modifying tools (T1/T2) is cheaper but limited by the AI's inherent capabilities [16]. Key Findings on Data Efficiency - The T2 paradigm demonstrates significantly higher data efficiency compared to the A2 paradigm. For instance, the Search-R1 using A2 requires approximately 170,000 training samples, while T2 only needs 2,400 samples, achieving comparable results [18][19][20]. Frontiers in Adaptability Research - The article identifies four cutting-edge directions for agent adaptability research: - Co-Adaptation: Aims for agents and tools to optimize together within the same learning cycle, presenting challenges in credit assignment [21]. - Continual Adaptation: Addresses the need for agents to continuously learn new skills without forgetting old ones in a changing environment [23]. - Safe Adaptation: Highlights concerns that large models may erode safety measures established during supervised fine-tuning, making them more vulnerable to attacks [25]. - Efficient Adaptation: Focuses on resource-constrained scenarios, discussing techniques like LoRA and FlashRL for efficient learning [27]. Additional Resources - The article mentions that a GitHub repository has been opened to continuously collect related papers and resources, serving as a guide for developers building agent systems [29].