StarPO

Search documents
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
Core Viewpoint - The article discusses the current state of AI agents, indicating that most are still in the "pilot purgatory" phase and have not yet transitioned to real-world applications, despite expectations for 2025 to be the "year of AI agents" [1][2]. Group 1: Current State of AI Agents - A survey on social platform X reveals that 64.2% of AI agents are stuck in pilot purgatory, while only 6.4% are smarter than the hype [2]. - The article highlights the need for advancements in AI systems to enhance their stability and reliability in enterprise applications [2]. Group 2: Introduction of RAGEN - A new system called RAGEN, developed by a team including researchers from Northwestern University, Microsoft, Stanford University, and the University of Washington, aims to improve AI agents' performance in real-world scenarios [2][5]. - RAGEN focuses on multi-turn interaction scenarios, requiring agents to reason under uncertainty and remember historical dialogues [5]. Group 3: StarPO Framework - RAGEN is built on a custom reinforcement learning framework named StarPO, which emphasizes learning through experience rather than rote memorization [5][7]. - The StarPO framework consists of two alternating phases: rollout, where the LLM generates complete interaction sequences, and update, where the model updates parameters based on normalized cumulative rewards [7]. Group 4: Training Challenges and Solutions - The article discusses the "Echo Trap" phenomenon, where agents generate repetitive responses due to early high rewards, leading to a decline in reasoning ability [12]. - To address training stability, the enhanced version StarPO-S introduces three key mechanisms: uncertainty-based rollout filtering, removal of KL penalty, and asymmetric PPO clipping [19]. Group 5: Evaluation Environments - RAGEN includes three symbolic testing environments to evaluate decision-making capabilities: Bandit, Sokoban, and Frozen Lake, each designed to assess different aspects of agent performance [15][17]. - These environments aim to minimize prior knowledge interference, allowing agents to rely solely on learned strategies for decision-making [15]. Group 6: Future Implications - RAGEN represents a significant step towards developing AI agents with autonomous reasoning capabilities, although challenges remain in applying these methods to real-world business processes [24]. - The article emphasizes the importance of optimizing reward mechanisms to focus on the quality of reasoning processes, not just the correctness of outcomes [24].
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
很多人都觉得 2025 年会是"AI 智能体元年",也就是基于 OpenAI、Anthropic、Google 和 DeepSeek 等机构提供的大语言模型,打造专注特定任务的智能体系统。 但是,最近在社交平台 X 上有个调查显示,现在大部分 Agent 都在"玩票"阶段,还没真正走出实验 室,普遍滞留在"企业试点"的状态中。 编译 | Tina 推理智能体训练框架已开源 与解题或代码生成等静态任务不同,RAGEN 聚焦在多轮交互场景中训练智能体,要求它们能在不确 定性中进行推理、记忆历史对话并灵活应对变化。 | Al agents in the enterprise right now are ... | | | --- | --- | | Smarter than the hype | 6.4% | | Stuck in pilot purgatory | 64.2% | | Powerful, but high effort O | 24.8% | | Nearing real scale | 4.6% | 不过,李飞飞所在的一支团队或许即将带来改变:他们与西北大学、微软、斯坦福大学和华盛顿大学 的研究 ...