Workflow
RECAP
icon
Search documents
“最强具身VLA大模型”,究竟强在哪儿?
3 6 Ke· 2025-11-20 07:38
Core Insights - The core contribution of the π*0.6 model lies in its introduction of a more intuitive learning method called RECAP, which allows robots to learn from their mistakes rather than merely imitating correct actions [3][8][24] - The model demonstrates a high success rate of over 90% in tasks such as making espresso, folding clothes, and assembling packaging boxes, showcasing its practical capabilities [1][20] Group 1: RECAP Methodology - RECAP consists of three main phases: offline reinforcement learning (RL) using diverse demonstration data, fine-tuning with human guidance, and online execution where robots learn from sparse rewards and expert corrections [10][20] - The methodology leverages a value function to evaluate actions and an advantage-conditioned strategy to update policies, allowing for efficient learning from both successful and unsuccessful experiences [13][16][42] Group 2: Model Architecture and Performance - The π*0.6 model builds upon previous versions, expanding its backbone from Gemma (2.6 billion parameters) to Gemma3 (4 billion parameters), and increasing Action Expert parameters to 860 million [20] - In challenging tasks, RECAP has doubled the throughput (successful task completions per hour) and reduced failure rates by approximately 50% compared to models that only utilized supervised fine-tuning [20] Group 3: Learning from Mistakes - The RECAP approach emphasizes the importance of learning from errors, enabling robots to recover from mistakes through expert intervention and self-correction, which is crucial for real-world applications [24][28] - By utilizing a value function to assess the quality of actions, the model can identify key steps and sources of errors, enhancing its ability to adapt and improve in complex environments [39][41]
“最强具身VLA大模型”,究竟强在哪儿?
量子位· 2025-11-20 00:30
在 π*0.6 的加持下,这些任务的成功率都达到了 90% 以上。 然而,仔细阅读论文就会发现,比起 连做13个小时咖啡, π*0.6真正的突破在于引入了一种更直觉的学习方法——Recap: 这彻底扭转了过去机器人只会逼近 "真值" 的模仿学习模式,让机器人能从自己的错误中成长。 Physical Intelligence 刷屏全网的机器人基础模型 π*0.6 ,一亮相就秀出了实力: 让机器人连续一整天制作意式浓缩咖啡,数小时不间断折叠各类衣物,还能精准组装工厂所需的包装纸箱。 henry 发自 凹非寺 量子位 | 公众号 QbitAI 看似轻描淡写,实则力透纸背。 就连网友也直呼: 从错误中学习,这不比人都强? 指导:用人类示范教它基础动作 辅导:纠错指导让它修正错误 练习:从自主经验中不断优化、变得更强 最强VLA模型——π*0.6 π*0.6 延续了Physical Intelligence此前一贯的 VLA(视觉-语言-动作模型)路线 ,是今年四月份发布 π0.5 以来最新的VLA模型。 总的来说, π*0.6 的核心贡献在于提出了一种通用训练方法—— 基于优势条件策略的经验与纠偏强化学习 (RL w ...