Workflow
智元机器人发布行业首个机器人世界模型开源平台——Genie Envisioner
机器人圈·2025-08-15 10:19

Core Insights - The article introduces Genie Envisioner (GE), a unified world model platform for real-world robot control, integrating future frame prediction, strategy learning, and simulation evaluation into a closed-loop architecture centered around video generation [1][3][19] - GE-Act, based on approximately 3000 hours of real robot operation data, significantly surpasses existing state-of-the-art (SOTA) methods in cross-platform generalization and long-sequence task execution, paving a new technological path for embodied intelligence [1][3][19] Group 1: Key Innovations - GE's core breakthrough lies in constructing a vision-centered modeling paradigm, directly modeling robot-environment interactions in visual space, which retains spatial structure and temporal evolution information [3][5] - This approach offers two key advantages: efficient cross-ontology generalization and strong future temporal prediction capabilities, enabling high-quality task execution with minimal data [3][5] Group 2: Technical Architecture - The GE platform consists of three tightly integrated components: GE-Base, GE-Act, and GE-Sim, each contributing to the overall functionality of the system [6][11][19] - GE-Base serves as the foundational model, utilizing a multi-view video generation framework and a sparse memory mechanism to enhance long-sequence reasoning capabilities [6][7] - GE-Act is a lightweight action model that translates visual latent representations into executable robot control commands, designed for efficiency and real-time performance [11][13] Group 3: Evaluation and Future Outlook - The EWMBench evaluation suite was developed to assess the quality of world models for embodied tasks, with GE-Base achieving optimal scores across multiple key indicators [20][22] - The team plans to open-source all code, pre-trained models, and evaluation tools, aiming to expand sensor modalities and support full-body movement and human-robot collaboration in future applications [23]