Google最新！Gemini Robotics 1.5：通用机器人领域的突破进展

Core Insights - The article discusses the breakthrough advancements in the field of general robotics presented in the "Gemini Robotics 1.5" report by Google DeepMind, highlighting the innovative models and their capabilities in perception, reasoning, and action [1][39]. Technical Architecture - The core architecture of Gemini Robotics 1.5 consists of a "Coordinator + Action Model" framework, enabling a functional closed loop through multimodal data interaction [2]. - The Coordinator (Gemini Robotics-ER 1.5) processes user inputs and environmental feedback, controlling the overall task flow and breaking down complex tasks into executable sub-steps [2]. - The Action Model (Gemini Robotics 1.5) translates natural language sub-instructions into robot action trajectories, supporting direct control of various robot forms without additional adaptation [2][4]. Motion Transfer Mechanism - The Motion Transfer (MT) mechanism addresses the "data silo" issue in traditional robotics by enabling skill generalization across different robot forms, validated through experimental comparisons [5][7]. - The Gemini Robotics 1.5 model, utilizing mixed data from multiple robot types, demonstrated superior performance in skill transfer compared to single-form training approaches [7][8]. Performance Validation - The introduction of a "thinking VLA" mechanism allows for a two-step process in task execution, enhancing performance in multi-step tasks by breaking down complex instructions into manageable sub-steps [8][11]. - Quantitative results show a performance improvement of approximately 21.8% in task completion scores when the thinking mode is activated [11]. - The model's ability to generalize skills across different robot forms was evidenced by significant performance gains in scenarios with limited training data [13][28]. Safety Mechanisms - The ER model incorporates safety mechanisms that assess risks and provide intervention strategies in various scenarios, ensuring safe task execution [36][38]. - Performance comparisons indicate that ER 1.5 excels in risk identification and mitigation, demonstrating a high accuracy rate in predicting potential hazards [36][38]. Conclusion and Future Directions - The Gemini Robotics 1.5 model represents a significant advancement in universal control for multiple robots, reducing deployment costs and enhancing task execution capabilities [39]. - The integration of reasoning and action is identified as a critical factor for achieving complex task completion, emphasizing the importance of the ER and VLA collaboration [39].