Workflow
Agentic RL
icon
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-10 19:32
RT Avi Chawla (@_avichawla)OpenClaw meets RL!OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.OpenClaw-RL solves this!It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL.The architecture is fully async. This means serving, reward scoring, and training all run in parallel.Once done, weights get hot-swapped after every batch while the agent keeps responding ...
补齐OpenClaw进化拼图!AReaL v1.0开源,智能体强化学习「一键接入」
机器之心· 2026-03-04 03:58
2026 开年已两个月,Agent 依然是全球最引人注目的 AI 赛道之一。OpenClaw(原 Clawbot)掀起的那波 Agent 热潮至今仍在发酵,甚至让「一人公司」概念第一 次真正有了落地的可能性。 就在近日,OpenClaw 超越了 React、Linux,成为 GitHub 上 Star 量最多的非资源/教程类开源软件项目。 从 Browser Agent 到 Coding Agent,从个人到企业级工作流 Agent,最直观的感受是:Agent 能做的事越来越多了。 与此同时,包括 LangChain、Claude Code、OpenClaw 等在内,各类运行时框架不断拓宽智能体的能力边界,使它们胜任更复杂的任务。虽然这些框架赋予了 Agent 更加广阔的应用潜力,但如何让它们在真实环境中持续提升并形成自我进化能力,仍缺乏成熟的体系支撑。 尤其是被寄予厚望、用于支撑 Agent 在复杂、多轮、长程任务中进化的强化学习(RL)训练,在工程落地上面临多重阻力,限制了当前 Agent 的能力天花板。 AReaL v1.0 的发布为行业带来了积极的信号:一个开箱即用的 Agentic RL 训练底座已 ...
大模型最难的AI Infra,用Vibe Coding搞定
机器之心· 2026-01-07 05:16
Core Insights - The article discusses the challenges and potential of Vibe Coding in AI infrastructure development, highlighting its limitations in complex systems and proposing a document-driven approach to enhance its effectiveness [3][5][20]. Group 1: Challenges of Vibe Coding - Vibe Coding faces three main issues: context loss, decision deviation, and quality instability, primarily due to the lack of a structured decision management mechanism [4][5]. - The complexity of AI infrastructure, characterized by thousands of lines of code and numerous interrelated decision points, exacerbates these challenges [4][5]. Group 2: Document-Driven Vibe Coding Methodology - The document-driven approach aims to systematize key decisions during the design phase, significantly reducing complexity and improving code quality [6][20]. - By focusing on high-level design decisions, developers can leverage AI for detailed code implementation, achieving complex functionalities with minimal coding [7][20]. Group 3: Implementation in Agentic RL - The article presents a case study on optimizing GPU utilization in Agentic Reinforcement Learning (RL) systems, which face significant resource scheduling challenges [11][12]. - A proposed time-sharing reuse scheme dynamically allocates GPU resources, addressing the inefficiencies of existing solutions and improving overall system performance [14][15]. Group 4: Performance Validation - Experiments on a large-scale GPU cluster demonstrated that the time-sharing reuse scheme increased rollout throughput by 3.5 times compared to traditional methods, significantly enhancing task completion rates and reducing timeout occurrences [46][50]. - The analysis indicates that the additional system overhead introduced by the new scheme is minimal, validating its practical value in large-scale Agentic RL training [53][55]. Group 5: Team and Future Directions - The article concludes with an introduction to the ROCK & ROLL team, which focuses on advancing RL technologies and enhancing the practical application of large language models [57]. - The team emphasizes collaboration and open-source contributions to foster innovation in the RL community [58].
自搜索强化学习SSRL:Agentic RL的Sim2Real时刻
机器之心· 2025-09-02 01:27
Core Insights - The article discusses the development and effectiveness of SSRL (Structured Search Reinforcement Learning) in enhancing the training efficiency and stability of Search Agents using large language models (LLMs) [6][28] - SSRL demonstrates superior performance over traditional methods that rely on external search engines, achieving effective transfer from simulation to real-world applications (Sim2Real) [6][28] Group 1 - SSRL utilizes structured prompts and format rewards to effectively extract world knowledge from models, leading to improved performance across various benchmarks and reduced hallucination [2][6] - The research highlights the high costs and inefficiencies associated with current RL training methods for Search Agents, which include full-real and semi-real search approaches [7][13] - The introduction of SSRL allows for a significant increase in training efficiency, estimated at approximately 5.6 times, while maintaining a continuous increase in training rewards without collapse [31][32] Group 2 - Experiments show that models trained with SSRL outperform those relying on external engines, particularly in real-world search scenarios, indicating the importance of integrating real-world knowledge [28][31] - The article presents findings that suggest the combination of self-generated knowledge and real-world knowledge can enhance model performance, particularly through entropy-guided search strategies [34] - The integration of SSRL with TTRL (Task-Driven Reinforcement Learning) has shown to improve generalization and effectiveness, achieving up to a 67% performance increase in certain tasks [38][39]