基于世界模型的强化学习
Search documents
捅破具身智能天花板!极佳视界新VLA大模型登场,复杂长时程任务近100%成功率
量子位· 2026-02-15 05:30
允中 发自 凹非寺 量子位 | 公众号 QbitAI 叠衣服、冲咖啡、折纸盒。 这些看似琐碎的小事,曾是具身智能跨不过去的"长时程"深渊。 但现在,纪录被刷新了: 数小时零失误、持续稳定运转。 还记得此前在RoboChallenge 斩获全球第一 的GigaBrain-0.1吗? | Rank | Model/User | Score | SR | | --- | --- | --- | --- | | | GigaBrain-0.1/lyf | 68.34 | 51.67% | | (N | Spirit-v1.5/Spirit AI | 67.19 | 51.00% | | 2 | pi0.5/rc_baseline | 61.84 | 42.67% | | 4 | wall-oss-v0.1/Pushi .. / | 55.30 | 35.33% | | 5 | pi0/rc_baseline | 46.41 | 28.33% | | 6 | pi05_generalist/wyf | 31.27 | 17.67% | | 7 | RDT-1B/zsz | 28.84 | 15.00% | | 8 | ...
人形机器人的真机强化学习! ICLR 2026 通研院提出人形机器人预训练与真机微调新范式
机器之心· 2026-02-07 07:00
Core Insights - The article discusses the advancements in humanoid robots, particularly their ability to perform complex tasks like dancing and running, while emphasizing the importance of continuous reinforcement learning in real-world environments [2][3] - The LIFT framework proposed by researchers aims to bridge the gap between large-scale pretraining and efficient fine-tuning for humanoid control, addressing the limitations of existing methods [9][12] Group 1: Background and Motivation - Current humanoid robots primarily rely on on-policy algorithms like PPO for pretraining, which are not effective for continuous learning due to safety and economic concerns [7] - The main challenge is to achieve large-scale pretraining speed without sacrificing sample efficiency and safety during the fine-tuning phase [9] Group 2: LIFT Framework - LIFT utilizes off-policy reinforcement learning algorithms like SAC for large-scale pretraining, which allows for better sample efficiency when data is limited [12][15] - The framework incorporates a physics-informed world model to enhance prediction performance and fine-tuning efficiency [12][18] Group 3: Experimental Results - LIFT demonstrated significant advantages over baseline methods like PPO and SAC in terms of convergence time and sample efficiency during pretraining and fine-tuning [20][24] - The framework allows for zero-shot deployment of pre-trained policies to real-world robots, showcasing its effectiveness in real-time applications [20][22] Group 4: Challenges and Future Directions - The article highlights several bottlenecks that need to be addressed for scaling reinforcement learning in real-world applications, including observation and state estimation, safety mechanisms, and system throughput [41]