Workflow
reinforcement learning
icon
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-29 19:03
RT Avi Chawla (@_avichawla)Microsoft did it again!Building with AI agents almost never works on the first try.A dev has to spend days tweaking prompts, adding examples, hoping it gets better.This is exactly what Microsoft's Agent Lightning solves.It's an open-source framework that trains ANY AI agent with reinforcement learning. Works with LangChain, AutoGen, CrewAI, OpenAI SDK, or plain Python.Here's how it works:> Your agent runs normally with whatever framework you're using. Just add a lightweight agl.em ...
X @Avi Chawla
Avi Chawla· 2026-03-29 06:37
Microsoft did it again!Building with AI agents almost never works on the first try.A dev has to spend days tweaking prompts, adding examples, hoping it gets better.This is exactly what Microsoft's Agent Lightning solves.It's an open-source framework that trains ANY AI agent with reinforcement learning. Works with LangChain, AutoGen, CrewAI, OpenAI SDK, or plain Python.Here's how it works:> Your agent runs normally with whatever framework you're using. Just add a lightweight agl.emit() helper or let the trac ...
X @Herbert Ong
Herbert Ong· 2025-08-20 15:59
🚨 FigureAI just shared a new video of its humanoid bot using the Helix walking controller.Trained with reinforcement learning, it’s walking blind (no cameras) and hitting superhuman levels in some areas.@Figure_robothttps://t.co/IiIRLYKmNf ...
RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai
AI Engineer· 2025-07-16 16:18
Large Language Models Evolution - Scaling laws 表明,增加计算量、数据和参数可以提高 Transformer 模型的性能,并推广到其他领域 [2][3] - 随着模型规模的扩大,性能持续提高,并在中等数学难题的解决率上有所体现,尤其是在提示模型展示思维链时 [5][7] - 通过强化学习和人类反馈,模型能够更好地遵循指令,从而实现聊天机器人等应用 [10][11] Inference Time Optimization - 通过生成多个响应并进行多数投票(自洽性),可以在推理时提高性能 [15] - 顺序修改之前的响应,特别是在可以验证答案的领域(如数学和编程),可以显著提高性能 [16][17] - 在可以验证答案的领域,推理时间计算的扩展可以转化为智能 [19] Reinforcement Learning for Autonomous Coding - 强化学习是下一个扩展前沿,特别是在可以自动验证输出的领域 [24] - 经验时代将通过强化学习构建超级智能系统,尤其是在具有自动验证的领域 [25] - 自动编码是一个扩展强化学习的绝佳领域,因为它具有验证输出的能力 [30][31] Challenges in Scaling Reinforcement Learning - 扩展强化学习比扩展 LLM 更具挑战性,因为它需要多个模型副本以及训练和推理循环 [29] - 在强化学习中,奖励模型的奖励函数设计是一个挑战 [29][30] Reflection's Mission - Reflection 致力于构建超级智能,并以自主编码作为根本问题 [33] - Reflection 团队由在 LLM 和强化学习领域有开创性工作的 35 位先驱组成 [33]