SIASUN-北京人形机器人创新中心提出具身世界模型WoW

Core Insights - Beijing Humanoid Robot Innovation Center has launched a new embodied world model called WoW (World-Omniscient World Model), aimed at enabling robots to "see, understand, and act in the world" [1] - WoW outperforms its predecessor Sora2 in terms of spatiotemporal consistency and physical reasoning, integrating visual, action, physical perception, and reasoning into a unified framework [1][2] - The model allows AI to learn physical laws through interaction, marking a significant advancement from merely generating images to understanding the physical world [1][2] Innovative Technical Architecture - WoW is a multi-modal large model framework that combines world generation, action prediction, visual understanding, and self-reflection into a single system, addressing limitations in traditional architectures [2] - The model learns from real robot interaction data, generating high-quality, physically consistent robot videos in both known and unknown scenarios [2] - WoW adheres to the SOPHIA paradigm, enabling the model to improve its accuracy and realism through self-teaching [2] New Benchmark Development - Beijing Humanoid has introduced WoWBench, a comprehensive benchmark for embodied world models, evaluating capabilities across four core dimensions: perception understanding, prediction reasoning, decision-making, and generalization execution [3] - The benchmark employs a mixed evaluation mechanism to ensure model performance aligns with human cognition [3] - The open-sourcing of parts of the WoW model significantly lowers the entry barrier for world model research, accelerating the integration of embodied intelligent robots into various aspects of life [3] Broad Application Prospects - WoW's innovative architecture and performance enable its application across multiple scenarios, providing a unified benchmark platform for world model research [4] - The model facilitates data migration and augmentation, allowing AI to generate synthetic samples from limited real data, creating a self-cycling process of "imagine-generate-annotate-migrate" [4] - WoW can translate visual "imagination" into executable action commands, enabling robots to autonomously understand and execute natural task instructions in complex environments [5] Demonstrated Technological Leadership - Beijing Humanoid's "Embodied Tiangong Ultra" won the first humanoid robot half-marathon championship and achieved significant victories in the inaugural World Humanoid Robot Sports Competition, showcasing its leading technological capabilities [5] - The open-sourcing of the WoW model further highlights Beijing Humanoid's strengths in AI, moving from understanding to reconstructing the world, and reinforcing its commitment to making robots "the best to use" [5]