Core Viewpoint - The article discusses the development of Genie Envisioner, a unified world foundation platform for robotic manipulation, which integrates strategy learning, evaluation, and simulation through a single video generation framework [3][27]. Group 1: Platform Overview - Genie Envisioner is built on a core component called GE-Base, which captures the spatial, temporal, and semantic dynamics of robot interactions [5][27]. - The platform includes GE-Act, a world action model that enables instruction-conditioned strategy reasoning, and GE-Sim, a video world simulator that supports closed-loop execution [6][21]. Group 2: Key Components - GE-Base is a large-scale video diffusion model that accurately captures real-world robot interaction features in a structured latent space [3][27]. - GE-Act utilizes a lightweight decoder with 160 million parameters to provide real-time control capabilities, achieving less than 10ms latency for diverse robotic tasks [15][27]. - GE-Sim constructs a high-fidelity environment for closed-loop strategy development, enhancing the framework's capabilities [21][27]. Group 3: Evaluation Framework - EWMBench is introduced as a standardized evaluation suite to assess the fidelity and utility of video-based world models in real-world robotic operations [23][27]. - The evaluation focuses on visual scene consistency, motion correctness, and semantic alignment, ensuring rigorous assessment of task-oriented scenarios [23][27]. Group 4: Training and Adaptation - The training process for GE-Base involves a large dataset with 1 million instruction-aligned video sequences, enabling robust model performance [11][27]. - GE-Act employs a three-phase training strategy to derive action strategies from the GE-Base model, optimizing for specific tasks and environments [17][19][27]. Group 5: Performance and Contributions - The integration of GE-Base, GE-Act, and GE-Sim has demonstrated superior performance in complex tasks such as fabric folding and packing, showcasing strong generalization capabilities [27]. - The platform establishes a powerful foundation for building general-purpose, instruction-driven embodied intelligence systems [27].
Genie Envisioner:面向机器人操作的统一世界基础平台
具身智能之心·2025-08-11 00:14