RAGEN - filings, earnings calls, financial reports, news

RAGEN

Search documents

机器之心· 2025-10-08 04:13

Core Insights - The article discusses the challenges faced by intelligent agents in maintaining clear reasoning and robust decision-making over long-term tasks, particularly when the task extends to hundreds of steps [2][3] - It introduces Verlog, a multi-turn reinforcement learning framework designed to handle long-horizon tasks effectively, overcoming limitations of traditional frameworks [3][20] Group 1: Framework Overview - Verlog is built on the foundations of VeRL and BALROG, incorporating specialized optimization techniques to ensure stable and efficient training across tasks that can extend beyond 400 steps [3][20] - The framework has been validated in complex environments such as BabyAI, BabaIsAI, and Crafter, demonstrating strong performance in tasks with varying episode lengths [3][19] Group 2: Methodology - The base model for Verlog is the Qwen-2.5 Instruct variant, which allows seamless integration with BALROG and facilitates the use of benchmark testing prompts with minimal modifications [6][7] - A memory mechanism is employed to retain only the latest n + 1 rounds of interactions, optimizing performance for the 3B parameter Qwen model [9][10] Group 3: Algorithmic Innovations - The Dual Discounting GAE algorithm is introduced to decouple tokens from steps, encouraging agents to complete tasks in fewer environment steps [11][20] - The recursive calculation of GAE enhances the stability of training, allowing for effective learning even in sparse reward scenarios [12][14] Group 4: Experimental Results - Verlog was tested on three challenging benchmarks: Crafter, BabyAI, and BabaIsAI, showcasing its ability to adapt to long-duration tasks with sparse rewards [16][19] - The training of the Qwen2.5-7B-Instruct model in the Crafter environment utilized 8 H100 GPUs over approximately 36 hours, while the Qwen2.5-3B-Instruct model for BabyAI and BabaIsAI was trained on 4 A40 GPUs for about 24 hours [19] Group 5: Future Directions - Verlog aims to serve as a flexible research platform to advance the development of long-horizon LLM-Agent reinforcement learning [21][20] - The framework addresses key engineering challenges such as managing long interaction histories, ensuring training stability under sparse rewards, and handling variable trajectory lengths [20][23]

从现有主流 RL 库来聊聊RL Infra架构演进

自动驾驶之心· 2025-09-25 23:33

Core Viewpoint - Reinforcement Learning (RL) is transitioning from a supportive technology to a core driver of model capabilities, focusing on multi-step, interactive agent training to achieve General Artificial Intelligence (AGI) [2][6]. Group 1: Modern RL Infrastructure Architecture - The core components of modern RL infrastructure include a Generator, which interacts with the environment to generate trajectories and calculate rewards, and a Trainer, which updates model parameters based on trajectory data [6][4]. - The generator-trainer architecture, combined with distributed coordination layers like Ray, forms the "gold standard" for RL systems [6][4]. Group 2: Primary Development - Primary Development frameworks serve as foundational frameworks for building RL training pipelines, providing core algorithm implementations and integration with underlying training/inference engines [8][7]. - TRL (Transformer Reinforcement Learning) is a user-friendly RL framework launched by Hugging Face, offering various algorithm supports [9][10]. - OpenRLHF, developed by a collaborative team including ByteDance and NetEase, aims to provide an efficient and scalable RLHF and Agentic RL framework [11][14]. - veRL, developed by Byte's Seed team, is one of the most comprehensive frameworks with extensive algorithm support [16][19]. - AReaL (Asynchronous Reinforcement Learning) is designed for large-scale, high-throughput RL training with a fully asynchronous architecture [20][21]. - NeMo-RL, launched by NVIDIA, integrates into its extensive NeMo ecosystem, focusing on production-level RL frameworks [24][28]. - ROLL, an Alibaba open-source framework, emphasizes asynchronous and Agentic capabilities for large-scale LLM RL [30][33]. - slime, developed by Tsinghua and Zhipu, is a lightweight framework focusing on seamless integration of SGLang with Megatron [34][36]. Group 3: Secondary Development - Secondary Development frameworks are built on primary frameworks, targeting specific downstream application scenarios like multi-modal, multi-agent, and GUI automation [44][3]. - Agentic RL frameworks, such as verl-agent, optimize for asynchronous rollout and training, addressing the core challenges of multi-round interactions with external environments [46][47]. - Multimodal RL frameworks, like VLM-R1 and EasyR1, focus on training visual-language reasoning models, addressing data processing and loss function design challenges [53][54]. - Multi-Agent RL frameworks, such as MARTI, integrate multi-agent reasoning and reinforcement learning for complex collaborative tasks [59][60]. Group 4: Summary and Trends - The RL infrastructure is evolving from a "workshop" model to a "standardized pipeline," with increasing modularity in framework design [65]. - Asynchronous architectures are becoming essential to address the computational asymmetry between rollout and training [66]. - The emergence of high-performance inference engines like vLLM and SGLang significantly accelerates the rollout process [66]. - The evolution from RLHF to Agentic RL reflects the growing complexity of tasks supported by new frameworks [66]. - Distributed training framework choices, such as Megatron-LM and DeepSpeed, are critical for large-scale model training [66]. - Scene-driven secondary development frameworks are addressing unique challenges in vertical domains [66]. - The importance of orchestrators for managing distributed components in RL systems is becoming widely recognized [66].

AI 智能体老“崩”？DeepSeek 前员工联手李飞飞等大佬开源新框架，教会模型真正推理

AI前线· 2025-04-24 03:03

Core Viewpoint - The article discusses the current state of AI agents, indicating that most are still in the "pilot purgatory" phase and have not yet transitioned to real-world applications, despite expectations for 2025 to be the "year of AI agents" [1][2]. Group 1: Current State of AI Agents - A survey on social platform X reveals that 64.2% of AI agents are stuck in pilot purgatory, while only 6.4% are smarter than the hype [2]. - The article highlights the need for advancements in AI systems to enhance their stability and reliability in enterprise applications [2]. Group 2: Introduction of RAGEN - A new system called RAGEN, developed by a team including researchers from Northwestern University, Microsoft, Stanford University, and the University of Washington, aims to improve AI agents' performance in real-world scenarios [2][5]. - RAGEN focuses on multi-turn interaction scenarios, requiring agents to reason under uncertainty and remember historical dialogues [5]. Group 3: StarPO Framework - RAGEN is built on a custom reinforcement learning framework named StarPO, which emphasizes learning through experience rather than rote memorization [5][7]. - The StarPO framework consists of two alternating phases: rollout, where the LLM generates complete interaction sequences, and update, where the model updates parameters based on normalized cumulative rewards [7]. Group 4: Training Challenges and Solutions - The article discusses the "Echo Trap" phenomenon, where agents generate repetitive responses due to early high rewards, leading to a decline in reasoning ability [12]. - To address training stability, the enhanced version StarPO-S introduces three key mechanisms: uncertainty-based rollout filtering, removal of KL penalty, and asymmetric PPO clipping [19]. Group 5: Evaluation Environments - RAGEN includes three symbolic testing environments to evaluate decision-making capabilities: Bandit, Sokoban, and Frozen Lake, each designed to assess different aspects of agent performance [15][17]. - These environments aim to minimize prior knowledge interference, allowing agents to rely solely on learned strategies for decision-making [15]. Group 6: Future Implications - RAGEN represents a significant step towards developing AI agents with autonomous reasoning capabilities, although challenges remain in applying these methods to real-world business processes [24]. - The article emphasizes the importance of optimizing reward mechanisms to focus on the quality of reasoning processes, not just the correctness of outcomes [24].

AI 智能体老“崩”？DeepSeek 前员工联手李飞飞等大佬开源新框架，教会模型真正推理

AI前线· 2025-04-24 03:03

很多人都觉得 2025 年会是"AI 智能体元年"，也就是基于 OpenAI、Anthropic、Google 和 DeepSeek 等机构提供的大语言模型，打造专注特定任务的智能体系统。但是，最近在社交平台 X 上有个调查显示，现在大部分 Agent 都在"玩票"阶段，还没真正走出实验室，普遍滞留在"企业试点"的状态中。编译 | Tina 推理智能体训练框架已开源与解题或代码生成等静态任务不同，RAGEN 聚焦在多轮交互场景中训练智能体，要求它们能在不确定性中进行推理、记忆历史对话并灵活应对变化。 | Al agents in the enterprise right now are ... | | | --- | --- | | Smarter than the hype | 6.4% | | Stuck in pilot purgatory | 64.2% | | Powerful, but high effort O | 24.8% | | Nearing real scale | 4.6% | 不过，李飞飞所在的一支团队或许即将带来改变：他们与西北大学、微软、斯坦福大学和华盛顿大学的研究 ...

Artificial Intelligence

强化学习

AI 智能体

Artificial Intelligence

StarPO

StarPO - S

Artificial Intelligence

强化学习

AI 智能体

Artificial Intelligence

StarPO

StarPO - S