Workflow
机器人策略评估
icon
Search documents
Veo何止生成视频:DeepMind正在用它模拟整个机器人世界
机器之心· 2025-12-15 08:10
Core Insights - The article discusses the development of generalist robots capable of performing various tasks through natural language instructions, highlighting significant challenges in real-world evaluation and safety assessment [1][3]. Group 1: Challenges in Robot Evaluation - Real-world evaluation is costly and time-consuming, requiring extensive hardware experiments across various scenarios, including extreme and out-of-distribution environments [1]. - Safety assessments are particularly challenging due to the potential for unsafe behaviors that cannot be repeatedly tested in real environments, making traditional evaluation methods difficult to implement [1]. Group 2: Limitations of Traditional Simulation - Traditional physical simulators have limitations in realism, diversity, setup costs, and visual consistency, which hinder their effectiveness in robot evaluation [2]. Group 3: Advancements in Video Modeling - Cutting-edge video models offer an alternative path for world simulation, addressing many challenges in robot strategy evaluation, though they face difficulties such as generating artifacts in closed-loop conditions and simulating contact dynamics [3]. Group 4: Introduction of Veo Robotics System - The article introduces a video modeling-based robot strategy evaluation system developed by Google DeepMind's Gemini Robotics team, which supports comprehensive evaluation needs, including in-distribution and out-of-distribution assessments [4][5]. - The system utilizes the advanced video generation model Veo, achieving high fidelity in visual realism and fine-grained control responses without the need for real physical setups [5]. Group 5: Experimental Validation - Over 1,600 real-world experiments validated the effectiveness of the video model predictions across eight generalist strategy checkpoints and five tasks, demonstrating a strong correlation between predicted and actual success rates [5][26]. - The system's ability to predict performance across different robot strategies was tested, confirming its reliability in ranking strategies based on performance [24][26]. Group 6: Safety Testing Capabilities - The Veo Robotics world model can be used for safety red team testing, allowing for the identification of potential unsafe behaviors in strategies without real-world risks [31].