具身基础模型(Embodied Foundation Models)

Search documents
当机器人能自己教自己:DeepMind发布自我改进的具身基座模型
锦秋集· 2025-09-19 08:41
Core Insights - The article discusses the evolution of embodied intelligence in robotics, emphasizing the transition from passive execution to active learning, with a focus on self-improvement through autonomous interaction and practice [1][4][10]. Group 1: Methodology - A two-stage training framework is proposed, consisting of Supervised Fine-Tuning (SFT) and Self-Improvement, which allows robots to autonomously practice tasks with minimal human supervision [5][10][15]. - The first stage, SFT, involves behavior cloning and predicting remaining steps to fine-tune the pre-trained model [16][17]. - The second stage, Self-Improvement, utilizes a data-driven reward function derived from the model's predictions, enabling robots to learn and improve their performance on downstream tasks [12][20][21]. Group 2: Performance and Results - The proposed method shows significant improvements in sample efficiency, with a 10% increase in autonomous practice time leading to over a 30% success rate increase in specific tasks, outperforming traditional methods that rely solely on expanded imitation data [2][6][12]. - In experiments, robots demonstrated remarkable cross-task and cross-domain generalization capabilities, achieving an 85% success rate in previously unseen tasks after self-improvement [2][4][12]. - The combination of pre-trained models and online self-improvement has unlocked unique abilities for robots to autonomously learn new skills beyond the scope of their training data [8][13][64]. Group 3: Future Challenges and Directions - Future challenges include skill chaining, reward inference in long-duration tasks, and ensuring training stability and early termination mechanisms [4][75]. - The research highlights the importance of multimodal pre-training for the success of the self-improvement phase, indicating that robust visual-language semantic foundations are crucial for effective self-reward mechanisms [3][56][78].