Workflow
具身推理
icon
Search documents
首款推理具身模型,谷歌DeepMind造!自主理解/规划/执行复杂任务,打破一机一训,还能互相0样本迁移技能
量子位· 2025-09-27 04:46
Core Viewpoint - Google DeepMind has launched the Gemini Robotics 1.5 series, marking a significant milestone in the development of general AI for real-world applications, featuring embodied reasoning capabilities that allow robots to "think before acting" [1][9]. Group 1: Model Composition - The Gemini Robotics 1.5 series consists of two main models: GR 1.5 for action execution and GR-ER 1.5 for embodied reasoning [2][8]. - GR-ER 1.5 is the world's first embodied model with simulated reasoning capabilities [3]. Group 2: Functional Capabilities - The combination of GR-ER 1.5 and GR 1.5 enables robots to perform complex multi-step tasks, such as sorting clothes by color or packing luggage based on weather conditions [5][6]. - GR 1.5 can adapt to various robot hardware, allowing a single model to operate across different platforms without the need for separate training [16][18]. Group 3: Motion Transfer Mechanism - The innovative "Motion Transfer" mechanism allows skills learned on one robot to be transferred to another, enhancing cross-platform functionality [21][48]. - This mechanism abstracts different robot actions into a unified semantic space, enabling seamless skill sharing across diverse hardware [56]. Group 4: Safety and Explainability - The GR 1.5 series enhances safety by allowing robots to self-correct during tasks and recognize potential risks, ensuring safe operation in human environments [34][36]. - The embodied reasoning model provides transparency in the robot's decision-making process, improving interpretability and trust [55][58]. Group 5: Performance Metrics - In benchmark tests, GR 1.5 outperformed previous models in various dimensions, including instruction generalization and task completion rates, achieving nearly 80% in long-sequence tasks [61][62]. - The model demonstrated unprecedented zero-shot transfer capabilities in cross-robot migration tests [63]. Group 6: Future Developments - The GR 1.5 series represents a shift from executing single commands to genuinely understanding and solving physical tasks [69]. - Currently, developers can access GR-ER 1.5 through Google AI Studio, while GR 1.5 is available to select partners [71].
Google推出Gemini Robotics 1.5,如何让机器人更聪明、更安全、更通用?
锦秋集· 2025-09-26 09:22
为什么智能机器人无法在复杂场景中工作,为什么当下的智能机器人还无法完成多步骤任务? 我们正推动实体智能体时代的发展——让机器人能够感知、规划、思考、使用工具并采取行动,从 而更好地解决复杂的多步骤任务。 今年早些时候,我们以Gemini Robotics系列模型为起点,在将Gemini的多模态理解能力引入物理 世界方面取得了重大进展。 如今,我们在推进智能型、真正通用型机器人的道路上又迈出了一步。我们推出两款具备高级思考 能力的模型,可解锁智能体体验: Google DeepMind推出的Gemini Robotics 1.5与Gemini Robotics-ER 1.5,恰好以底层技术 创新视角填补了这一空白。 作为构建下一代Physical Agents的核心引擎,这两款模型形成"推理 大脑+执行中枢"的黄金组合: ER 1.5作为性能最优的VLM模型,在15项实体推理学术基准测试中斩获综合第一,能通过自然语 言理解复杂需求,原生调用谷歌搜索获取外部信息(如当地垃圾分类规则),制定多步骤计划并估 算任务成功率,更支持"思考预算"自定义以平衡延迟与精度。 Robotics 1.5则作为顶尖VLA模型,凭借" ...