AgentFlow

Search documents
 AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
 3 6 Ke· 2025-10-24 12:45
斯坦福等新框架,用在线强化学习让智能体系统"以小搏大",领先GPT-4o—— AgentFlow,是一种能够在线优化智能体系统的新范式,可以持续提升智能体系统对于复杂问题的推理能力。 它由规划器、执行器、验证器、生成器四个专业智能体组成的团队通过共享内存进行协作,利用新方法Flow-GRPO,在系统内部直接对其规划器智能体 进行实时优化。 以Qwen-2.5-7B-Instruct为基座模型的AgentFlow在10个基准测试中表现突出: 搜索任务提升14.9%、智能体任务提升14.0%、数学任务提升14.5%、科学任务提升4.1%。 多项任务表现甚至超越比其大50倍的模型,超越GPT-4o、Llama3.1-405B。 | | | | | Search Intensive | | | | | | Math Reasoning | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | Size | Bamboogle | 2Wiki | HotpotOA | Musique | A | S ...
 智能体系统如何「边做边学」?斯坦福团队探索在线优化的新范式
 机器之心· 2025-10-24 09:12
 Core Insights - The article discusses the limitations of traditional methods for enabling intelligent agents to perform complex reasoning and tool usage, highlighting the need for a more scalable and adaptable approach [2][3][4] - The proposed AgentFlow framework integrates collaborative reasoning among multiple independent agent modules and introduces the Flow-GRPO algorithm for training, achieving significant performance improvements in various tasks [3][4][15]   Group 1: Traditional Methods and Challenges - Traditional approaches to training language models for complex task reasoning either involve a single model handling both reasoning and tool usage or rely on static prompt-driven systems [11][14] - The first method struggles with stability and scalability in long-chain reasoning and dynamic environments, while the second lacks learning and adaptation capabilities [3][14] - The research team aimed to enable agent systems to learn and evolve through interaction, addressing the limitations of existing methods [14][15]   Group 2: AgentFlow Framework - AgentFlow is a modular, tool-integrated intelligent agent system designed to overcome scalability and generalization limitations of current methods [15][27] - It features a planner that adapts in real-time during agent interactions, allowing for adaptive reasoning and robust tool-calling [15][19] - The framework demonstrates significant improvements in long-term planning, tool efficiency, and dynamic reasoning depth across various domains [4][15]   Group 3: Flow-GRPO Algorithm - Flow-GRPO addresses the challenge of multi-turn credit assignment in reinforcement learning by broadcasting outcome rewards to each step, transforming complex multi-turn problems into manageable single-turn updates [19][20] - This method alleviates sparse reward issues and enhances training efficiency, providing a foundation for stable learning in complex reasoning tasks [20][27]   Group 4: Experimental Results - AgentFlow was evaluated across ten benchmark tests, outperforming existing leading methods, including large proprietary models like GPT-4o [22][27] - Notable performance improvements include a 14.9% increase in knowledge retrieval, 14.0% in agentic reasoning, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [24][27] - The 7B parameter AgentFlow model surpassed the performance of 200B parameter models, demonstrating that effective system design can be more impactful than merely increasing model size [27][30]   Group 5: Learning and Adaptation - The research indicates that online learning in real interaction environments is crucial for achieving efficient reasoning, as offline supervised training led to significant performance drops [27][30] - The system autonomously discovered new tool usage patterns, enhancing its ability to gather information through combined tool strategies [30][33] - AgentFlow's performance improves with increased reasoning steps without excessively extending average reasoning time, indicating effective task handling [33][35]   Group 6: Conclusion and Future Potential - AgentFlow presents a novel approach to intelligent agent training, emphasizing continuous learning and adaptation over a single comprehensive model [36][37] - The work highlights the potential and imaginative possibilities within the field of agentic AI, despite the distance from research exploration to practical application [37]
 AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
 量子位· 2025-10-24 03:53
 Core Insights - The article discusses the introduction of AgentFlow, a new paradigm in online reinforcement learning that enhances the reasoning capabilities of intelligent systems, outperforming models like GPT-4o and Llama3.1-405B [1][4][23].   Group 1: AgentFlow Overview - AgentFlow consists of a team of specialized agents including a planner, executor, verifier, and generator, which collaborate through shared memory to optimize decision-making in real-time [1][14][18]. - The Flow-GRPO method allows for on-policy optimization of the planner agent, enabling adaptive decision-making based on environmental changes and feedback from other agents [19][16].   Group 2: Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning, 14.5% in math reasoning, and 4.1% in scientific reasoning [3][25][27]. - The model's performance surpasses that of larger models, demonstrating that effective system design and training methods can be more impactful than simply increasing model size [27].   Group 3: Learning Mechanisms - The article emphasizes the importance of "learning in the flow," indicating that online learning in real interactive environments is crucial for achieving efficient reasoning [28][29]. - AgentFlow's architecture allows for rapid error correction and improved task planning through real-time training, enhancing overall system performance [30][29].   Group 4: Innovations and Findings - The system autonomously discovers new solution paths, such as combining different search tools to enhance information retrieval, showcasing its ability to adapt and innovate [33]. - AgentFlow maintains performance improvements without significantly increasing the average reasoning steps, indicating efficient handling of complex tasks [35].   Group 5: Future Implications - The article concludes that AgentFlow presents a novel approach to intelligent agent training, advocating for systems that adapt and learn continuously rather than relying on a single comprehensive model [37][38]. - Despite the distance from research to practical application, the potential for Agentic AI remains significant, suggesting a promising future for intelligent systems [39].



