具身推理
Search documents
首款推理具身模型,谷歌DeepMind造!自主理解/规划/执行复杂任务,打破一机一训,还能互相0样本迁移技能
量子位· 2025-09-27 04:46
Core Viewpoint - Google DeepMind has launched the Gemini Robotics 1.5 series, marking a significant milestone in the development of general AI for real-world applications, featuring embodied reasoning capabilities that allow robots to "think before acting" [1][9]. Group 1: Model Composition - The Gemini Robotics 1.5 series consists of two main models: GR 1.5 for action execution and GR-ER 1.5 for embodied reasoning [2][8]. - GR-ER 1.5 is the world's first embodied model with simulated reasoning capabilities [3]. Group 2: Functional Capabilities - The combination of GR-ER 1.5 and GR 1.5 enables robots to perform complex multi-step tasks, such as sorting clothes by color or packing luggage based on weather conditions [5][6]. - GR 1.5 can adapt to various robot hardware, allowing a single model to operate across different platforms without the need for separate training [16][18]. Group 3: Motion Transfer Mechanism - The innovative "Motion Transfer" mechanism allows skills learned on one robot to be transferred to another, enhancing cross-platform functionality [21][48]. - This mechanism abstracts different robot actions into a unified semantic space, enabling seamless skill sharing across diverse hardware [56]. Group 4: Safety and Explainability - The GR 1.5 series enhances safety by allowing robots to self-correct during tasks and recognize potential risks, ensuring safe operation in human environments [34][36]. - The embodied reasoning model provides transparency in the robot's decision-making process, improving interpretability and trust [55][58]. Group 5: Performance Metrics - In benchmark tests, GR 1.5 outperformed previous models in various dimensions, including instruction generalization and task completion rates, achieving nearly 80% in long-sequence tasks [61][62]. - The model demonstrated unprecedented zero-shot transfer capabilities in cross-robot migration tests [63]. Group 6: Future Developments - The GR 1.5 series represents a shift from executing single commands to genuinely understanding and solving physical tasks [69]. - Currently, developers can access GR-ER 1.5 through Google AI Studio, while GR 1.5 is available to select partners [71].
Google推出Gemini Robotics 1.5,如何让机器人更聪明、更安全、更通用?
锦秋集· 2025-09-26 09:22
Core Insights - The article discusses the limitations of current intelligent robots in handling complex tasks and how Google DeepMind's Gemini Robotics 1.5 and ER 1.5 models address these challenges through innovative technology [1][3][50]. Group 1: Model Capabilities - Gemini Robotics 1.5 is a powerful VLA model that translates visual information and instructions into motion commands, demonstrating advanced reasoning capabilities before action [5][20]. - Gemini Robotics-ER 1.5 excels in embodied reasoning, capable of making detailed multi-step plans and utilizing external digital tools like Google Search for task execution [5][18]. - Both models enhance the ability of robots to perform diverse tasks such as household chores, warehouse picking (accuracy improved to 92%), and medical suturing (success rate of 89%) [2][3]. Group 2: Technical Innovations - The models create a "perception-reasoning-planning-execution" closed loop, allowing for seamless task execution in various environments [2][8]. - The "thinking budget" feature allows developers to control the trade-off between latency and accuracy, optimizing performance for different task complexities [23][47]. - Cross-entity learning capability enables skills learned on one robot to be transferred to another without additional training, significantly reducing adaptation costs [15][79]. Group 3: Safety and Security - The models incorporate advanced safety measures, including semantic safety filtering and physical constraint awareness, ensuring responsible deployment in human-centric environments [16][48]. - Gemini Robotics-ER 1.5 has undergone rigorous evaluation through the upgraded ASIMOV benchmark, demonstrating superior performance in understanding semantic safety and adhering to physical constraints [16][48]. Group 4: Development and Ecosystem - The ER 1.5 model has been made available to global developers through the Gemini API, fostering a collaborative ecosystem for rapid technological application [2][3]. - The models are designed to guide the evolution of physical agents, providing insights into technical pathways, safety standards, and developer empowerment [2][50].