RAGEN

Search documents
开源RL框架Verlog来了,专为LLM智能体打造,400回合不成问题
机器之心· 2025-10-08 04:13
它在继承 VeRL 和 BALROG 的基础上,并遵循 pytorch-a2c-ppo-acktr-gail 的成熟设计原则,引入了一系列专 门优化手段,从而在任务跨度从短暂交互到数百回合时,依然能够实现稳定而高效的训练。 以往的框架(如 VeRL 和 RAGEN)能够较好地处理约 10 回合的任务,verl-agent 则可扩展至 50 回合。而 Verlog 则被设计用于超过 400 回合的环境,使其在复杂的长期决策任务中具备独特优势。 这一能力已在 BabyAI、BabaIsAI 和 Crafter 等高难度领域得到验证。以 Crafter 为例,其回合长度范围在 70 到 400 步之间,平均约为 190 步。在这些充满挑战的环境中,Verlog 都能够开箱即用地展现出强劲的性能。 机器之心报道 机器之心编辑部 AI 时代,智能体对短期对话的处理能力已不再是难题。真正的挑战是让智能体在数百步的探索中依然保持 清晰的推理与稳健的决策。 传统的强化学习框架在几十步内尚能应付,但一旦任务延展至数百步,奖励稀疏、历史冗长、策略崩塌便 接踵而至。 为了应对这些挑战,来自卡内基梅隆大学、香港大学等机构的研究者提出 ...
从现有主流 RL 库来聊聊RL Infra架构演进
自动驾驶之心· 2025-09-25 23:33
Core Viewpoint - Reinforcement Learning (RL) is transitioning from a supportive technology to a core driver of model capabilities, focusing on multi-step, interactive agent training to achieve General Artificial Intelligence (AGI) [2][6]. Group 1: Modern RL Infrastructure Architecture - The core components of modern RL infrastructure include a Generator, which interacts with the environment to generate trajectories and calculate rewards, and a Trainer, which updates model parameters based on trajectory data [6][4]. - The generator-trainer architecture, combined with distributed coordination layers like Ray, forms the "gold standard" for RL systems [6][4]. Group 2: Primary Development - Primary Development frameworks serve as foundational frameworks for building RL training pipelines, providing core algorithm implementations and integration with underlying training/inference engines [8][7]. - TRL (Transformer Reinforcement Learning) is a user-friendly RL framework launched by Hugging Face, offering various algorithm supports [9][10]. - OpenRLHF, developed by a collaborative team including ByteDance and NetEase, aims to provide an efficient and scalable RLHF and Agentic RL framework [11][14]. - veRL, developed by Byte's Seed team, is one of the most comprehensive frameworks with extensive algorithm support [16][19]. - AReaL (Asynchronous Reinforcement Learning) is designed for large-scale, high-throughput RL training with a fully asynchronous architecture [20][21]. - NeMo-RL, launched by NVIDIA, integrates into its extensive NeMo ecosystem, focusing on production-level RL frameworks [24][28]. - ROLL, an Alibaba open-source framework, emphasizes asynchronous and Agentic capabilities for large-scale LLM RL [30][33]. - slime, developed by Tsinghua and Zhipu, is a lightweight framework focusing on seamless integration of SGLang with Megatron [34][36]. Group 3: Secondary Development - Secondary Development frameworks are built on primary frameworks, targeting specific downstream application scenarios like multi-modal, multi-agent, and GUI automation [44][3]. - Agentic RL frameworks, such as verl-agent, optimize for asynchronous rollout and training, addressing the core challenges of multi-round interactions with external environments [46][47]. - Multimodal RL frameworks, like VLM-R1 and EasyR1, focus on training visual-language reasoning models, addressing data processing and loss function design challenges [53][54]. - Multi-Agent RL frameworks, such as MARTI, integrate multi-agent reasoning and reinforcement learning for complex collaborative tasks [59][60]. Group 4: Summary and Trends - The RL infrastructure is evolving from a "workshop" model to a "standardized pipeline," with increasing modularity in framework design [65]. - Asynchronous architectures are becoming essential to address the computational asymmetry between rollout and training [66]. - The emergence of high-performance inference engines like vLLM and SGLang significantly accelerates the rollout process [66]. - The evolution from RLHF to Agentic RL reflects the growing complexity of tasks supported by new frameworks [66]. - Distributed training framework choices, such as Megatron-LM and DeepSpeed, are critical for large-scale model training [66]. - Scene-driven secondary development frameworks are addressing unique challenges in vertical domains [66]. - The importance of orchestrators for managing distributed components in RL systems is becoming widely recognized [66].
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
Core Viewpoint - The article discusses the current state of AI agents, indicating that most are still in the "pilot purgatory" phase and have not yet transitioned to real-world applications, despite expectations for 2025 to be the "year of AI agents" [1][2]. Group 1: Current State of AI Agents - A survey on social platform X reveals that 64.2% of AI agents are stuck in pilot purgatory, while only 6.4% are smarter than the hype [2]. - The article highlights the need for advancements in AI systems to enhance their stability and reliability in enterprise applications [2]. Group 2: Introduction of RAGEN - A new system called RAGEN, developed by a team including researchers from Northwestern University, Microsoft, Stanford University, and the University of Washington, aims to improve AI agents' performance in real-world scenarios [2][5]. - RAGEN focuses on multi-turn interaction scenarios, requiring agents to reason under uncertainty and remember historical dialogues [5]. Group 3: StarPO Framework - RAGEN is built on a custom reinforcement learning framework named StarPO, which emphasizes learning through experience rather than rote memorization [5][7]. - The StarPO framework consists of two alternating phases: rollout, where the LLM generates complete interaction sequences, and update, where the model updates parameters based on normalized cumulative rewards [7]. Group 4: Training Challenges and Solutions - The article discusses the "Echo Trap" phenomenon, where agents generate repetitive responses due to early high rewards, leading to a decline in reasoning ability [12]. - To address training stability, the enhanced version StarPO-S introduces three key mechanisms: uncertainty-based rollout filtering, removal of KL penalty, and asymmetric PPO clipping [19]. Group 5: Evaluation Environments - RAGEN includes three symbolic testing environments to evaluate decision-making capabilities: Bandit, Sokoban, and Frozen Lake, each designed to assess different aspects of agent performance [15][17]. - These environments aim to minimize prior knowledge interference, allowing agents to rely solely on learned strategies for decision-making [15]. Group 6: Future Implications - RAGEN represents a significant step towards developing AI agents with autonomous reasoning capabilities, although challenges remain in applying these methods to real-world business processes [24]. - The article emphasizes the importance of optimizing reward mechanisms to focus on the quality of reasoning processes, not just the correctness of outcomes [24].
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
很多人都觉得 2025 年会是"AI 智能体元年",也就是基于 OpenAI、Anthropic、Google 和 DeepSeek 等机构提供的大语言模型,打造专注特定任务的智能体系统。 但是,最近在社交平台 X 上有个调查显示,现在大部分 Agent 都在"玩票"阶段,还没真正走出实验 室,普遍滞留在"企业试点"的状态中。 编译 | Tina 推理智能体训练框架已开源 与解题或代码生成等静态任务不同,RAGEN 聚焦在多轮交互场景中训练智能体,要求它们能在不确 定性中进行推理、记忆历史对话并灵活应对变化。 | Al agents in the enterprise right now are ... | | | --- | --- | | Smarter than the hype | 6.4% | | Stuck in pilot purgatory | 64.2% | | Powerful, but high effort O | 24.8% | | Nearing real scale | 4.6% | 不过,李飞飞所在的一支团队或许即将带来改变:他们与西北大学、微软、斯坦福大学和华盛顿大学 的研究 ...