基于世界模型的强化学习 - filings, earnings calls, financial reports, news

基于世界模型的强化学习

Search documents

量子位· 2026-02-15 05:30

允中发自凹非寺量子位 | 公众号 QbitAI 叠衣服、冲咖啡、折纸盒。这些看似琐碎的小事，曾是具身智能跨不过去的"长时程"深渊。但现在，纪录被刷新了：数小时零失误、持续稳定运转。还记得此前在RoboChallenge 斩获全球第一的GigaBrain-0.1吗？ | Rank | Model/User | Score | SR | | --- | --- | --- | --- | | | GigaBrain-0.1/lyf | 68.34 | 51.67% | | (N | Spirit-v1.5/Spirit AI | 67.19 | 51.00% | | 2 | pi0.5/rc_baseline | 61.84 | 42.67% | | 4 | wall-oss-v0.1/Pushi .. / | 55.30 | 35.33% | | 5 | pi0/rc_baseline | 46.41 | 28.33% | | 6 | pi05_generalist/wyf | 31.27 | 17.67% | | 7 | RDT-1B/zsz | 28.84 | 15.00% | | 8 | ...

具身智能

基于世界模型的强化学习

人工智能

GigaBrain-0.5M* VLA大模型

具身智能

基于世界模型的强化学习

人工智能

GigaBrain-0.5M* VLA大模型

人形机器人的真机强化学习! ICLR 2026 通研院提出人形机器人预训练与真机微调新范式

机器之心· 2026-02-07 07:00

Core Insights - The article discusses the advancements in humanoid robots, particularly their ability to perform complex tasks like dancing and running, while emphasizing the importance of continuous reinforcement learning in real-world environments [2][3] - The LIFT framework proposed by researchers aims to bridge the gap between large-scale pretraining and efficient fine-tuning for humanoid control, addressing the limitations of existing methods [9][12] Group 1: Background and Motivation - Current humanoid robots primarily rely on on-policy algorithms like PPO for pretraining, which are not effective for continuous learning due to safety and economic concerns [7] - The main challenge is to achieve large-scale pretraining speed without sacrificing sample efficiency and safety during the fine-tuning phase [9] Group 2: LIFT Framework - LIFT utilizes off-policy reinforcement learning algorithms like SAC for large-scale pretraining, which allows for better sample efficiency when data is limited [12][15] - The framework incorporates a physics-informed world model to enhance prediction performance and fine-tuning efficiency [12][18] Group 3: Experimental Results - LIFT demonstrated significant advantages over baseline methods like PPO and SAC in terms of convergence time and sample efficiency during pretraining and fine-tuning [20][24] - The framework allows for zero-shot deployment of pre-trained policies to real-world robots, showcasing its effectiveness in real-time applications [20][22] Group 4: Challenges and Future Directions - The article highlights several bottlenecks that need to be addressed for scaling reinforcement learning in real-world applications, including observation and state estimation, safety mechanisms, and system throughput [41]