Workflow
AgentFlow
icon
Search documents
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
3 6 Ke· 2025-10-24 12:45
Core Insights - AgentFlow introduces a new paradigm for online reinforcement learning, enhancing the reasoning capabilities of agent systems through real-time optimization and collaboration among specialized agents [1][11][14]. Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning tasks, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [4][19][21]. - The performance of AgentFlow surpasses that of larger models, including GPT-4o and Llama3.1-405B, demonstrating that effective system design can outperform sheer model size [21][25]. System Architecture - The architecture of AgentFlow consists of four specialized agents: a planner for task analysis and tool selection, an executor for tool invocation, a verifier for evaluating intermediate results, and a generator for synthesizing final outputs [11][13][14]. - The system employs a shared memory design that facilitates collaboration and reduces error propagation in multi-step reasoning processes [7][14]. Learning Mechanism - The on-policy optimization of the planner within the agent interaction flow is crucial for adapting to environmental changes and feedback, leading to a robust and self-evolving reasoning process [13][14][22]. - The Flow-GRPO algorithm addresses the challenges of multi-turn credit assignment in reinforcement learning, enhancing training efficiency and stability in complex reasoning tasks [15][19]. Research Findings - The study reveals that online learning in real interaction environments is essential for achieving efficient reasoning, as opposed to offline supervised learning, which can lead to performance declines [22][25]. - AgentFlow's training allows the system to autonomously discover new tool combinations and usage patterns, enhancing its problem-solving capabilities [25][29]. Future Implications - AgentFlow represents a shift from seeking a single comprehensive model to enabling agents to adapt and learn continuously within a system, highlighting the potential of collaborative intelligence in addressing complex tasks [29].
智能体系统如何「边做边学」?斯坦福团队探索在线优化的新范式
机器之心· 2025-10-24 09:12
Core Insights - The article discusses the limitations of traditional methods for enabling intelligent agents to perform complex reasoning and tool usage, highlighting the need for a more scalable and adaptable approach [2][3][4] - The proposed AgentFlow framework integrates collaborative reasoning among multiple independent agent modules and introduces the Flow-GRPO algorithm for training, achieving significant performance improvements in various tasks [3][4][15] Group 1: Traditional Methods and Challenges - Traditional approaches to training language models for complex task reasoning either involve a single model handling both reasoning and tool usage or rely on static prompt-driven systems [11][14] - The first method struggles with stability and scalability in long-chain reasoning and dynamic environments, while the second lacks learning and adaptation capabilities [3][14] - The research team aimed to enable agent systems to learn and evolve through interaction, addressing the limitations of existing methods [14][15] Group 2: AgentFlow Framework - AgentFlow is a modular, tool-integrated intelligent agent system designed to overcome scalability and generalization limitations of current methods [15][27] - It features a planner that adapts in real-time during agent interactions, allowing for adaptive reasoning and robust tool-calling [15][19] - The framework demonstrates significant improvements in long-term planning, tool efficiency, and dynamic reasoning depth across various domains [4][15] Group 3: Flow-GRPO Algorithm - Flow-GRPO addresses the challenge of multi-turn credit assignment in reinforcement learning by broadcasting outcome rewards to each step, transforming complex multi-turn problems into manageable single-turn updates [19][20] - This method alleviates sparse reward issues and enhances training efficiency, providing a foundation for stable learning in complex reasoning tasks [20][27] Group 4: Experimental Results - AgentFlow was evaluated across ten benchmark tests, outperforming existing leading methods, including large proprietary models like GPT-4o [22][27] - Notable performance improvements include a 14.9% increase in knowledge retrieval, 14.0% in agentic reasoning, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [24][27] - The 7B parameter AgentFlow model surpassed the performance of 200B parameter models, demonstrating that effective system design can be more impactful than merely increasing model size [27][30] Group 5: Learning and Adaptation - The research indicates that online learning in real interaction environments is crucial for achieving efficient reasoning, as offline supervised training led to significant performance drops [27][30] - The system autonomously discovered new tool usage patterns, enhancing its ability to gather information through combined tool strategies [30][33] - AgentFlow's performance improves with increased reasoning steps without excessively extending average reasoning time, indicating effective task handling [33][35] Group 6: Conclusion and Future Potential - AgentFlow presents a novel approach to intelligent agent training, emphasizing continuous learning and adaptation over a single comprehensive model [36][37] - The work highlights the potential and imaginative possibilities within the field of agentic AI, despite the distance from research exploration to practical application [37]
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
量子位· 2025-10-24 03:53
Core Insights - The article discusses the introduction of AgentFlow, a new paradigm in online reinforcement learning that enhances the reasoning capabilities of intelligent systems, outperforming models like GPT-4o and Llama3.1-405B [1][4][23]. Group 1: AgentFlow Overview - AgentFlow consists of a team of specialized agents including a planner, executor, verifier, and generator, which collaborate through shared memory to optimize decision-making in real-time [1][14][18]. - The Flow-GRPO method allows for on-policy optimization of the planner agent, enabling adaptive decision-making based on environmental changes and feedback from other agents [19][16]. Group 2: Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning, 14.5% in math reasoning, and 4.1% in scientific reasoning [3][25][27]. - The model's performance surpasses that of larger models, demonstrating that effective system design and training methods can be more impactful than simply increasing model size [27]. Group 3: Learning Mechanisms - The article emphasizes the importance of "learning in the flow," indicating that online learning in real interactive environments is crucial for achieving efficient reasoning [28][29]. - AgentFlow's architecture allows for rapid error correction and improved task planning through real-time training, enhancing overall system performance [30][29]. Group 4: Innovations and Findings - The system autonomously discovers new solution paths, such as combining different search tools to enhance information retrieval, showcasing its ability to adapt and innovate [33]. - AgentFlow maintains performance improvements without significantly increasing the average reasoning steps, indicating efficient handling of complex tasks [35]. Group 5: Future Implications - The article concludes that AgentFlow presents a novel approach to intelligent agent training, advocating for systems that adapt and learn continuously rather than relying on a single comprehensive model [37][38]. - Despite the distance from research to practical application, the potential for Agentic AI remains significant, suggesting a promising future for intelligent systems [39].