Google推出Gemini Robotics 1.5，如何让机器人更聪明、更安全、更通用？

Core Insights - The article discusses the limitations of current intelligent robots in handling complex tasks and how Google DeepMind's Gemini Robotics 1.5 and ER 1.5 models address these challenges through innovative technology [1][3][50]. Group 1: Model Capabilities - Gemini Robotics 1.5 is a powerful VLA model that translates visual information and instructions into motion commands, demonstrating advanced reasoning capabilities before action [5][20]. - Gemini Robotics-ER 1.5 excels in embodied reasoning, capable of making detailed multi-step plans and utilizing external digital tools like Google Search for task execution [5][18]. - Both models enhance the ability of robots to perform diverse tasks such as household chores, warehouse picking (accuracy improved to 92%), and medical suturing (success rate of 89%) [2][3]. Group 2: Technical Innovations - The models create a "perception-reasoning-planning-execution" closed loop, allowing for seamless task execution in various environments [2][8]. - The "thinking budget" feature allows developers to control the trade-off between latency and accuracy, optimizing performance for different task complexities [23][47]. - Cross-entity learning capability enables skills learned on one robot to be transferred to another without additional training, significantly reducing adaptation costs [15][79]. Group 3: Safety and Security - The models incorporate advanced safety measures, including semantic safety filtering and physical constraint awareness, ensuring responsible deployment in human-centric environments [16][48]. - Gemini Robotics-ER 1.5 has undergone rigorous evaluation through the upgraded ASIMOV benchmark, demonstrating superior performance in understanding semantic safety and adhering to physical constraints [16][48]. Group 4: Development and Ecosystem - The ER 1.5 model has been made available to global developers through the Gemini API, fostering a collaborative ecosystem for rapid technological application [2][3]. - The models are designed to guide the evolution of physical agents, providing insights into technical pathways, safety standards, and developer empowerment [2][50].