Workflow
VideoWorld 2
icon
Search documents
本土厂商加速布局世界模型,游戏行业优先受益
China Post Securities· 2026-03-30 07:52
Industry Investment Rating - The industry investment rating is "Outperform the Market" and is maintained [1] Core Insights - The report highlights that domestic companies are accelerating their entry into the world model sector, which is expected to speed up the industrialization process. Previously, research in this area was mainly dominated by overseas institutions like Google and World Labs. Domestic firms, including ByteDance, Ant Group, Tencent, Huawei, and Alibaba, are now making significant strides in this field [5][6] - The application of AI technology in game development has reached an overall application rate of 86.36% as of 2025, with a focus on art design, automated testing, and sound generation. The penetration rate for core game asset generation is approximately 36.8%. The world model is anticipated to enhance AI's role in complex asset generation and scene construction, transitioning from a "point efficiency tool" to a "system-level productivity platform" [6] - The report suggests that game interaction is evolving from "character-level intelligence" to "scene-level generation + system-level interaction," which will transform content production from "pre-fabricated content supply" to "real-time generated supply" [6] Summary by Sections Industry Overview - The closing index is 784.68, with a 52-week high of 1021.75 and a low of 591.71 [1] Investment Highlights - Companies to watch include those with dual capabilities in world model development and scene application, such as Kunlun Wanwei, and large 3D game production companies like Perfect World and Giant Network [7]
CVPR 2026 | AI寒武纪时刻?字节世界模型新作,仅靠视觉学习真实世界知识
机器之心· 2026-03-07 11:20
Core Viewpoint - The article discusses the introduction of "VideoWorld 2," a visual world model developed by the Doubao model team in collaboration with Beijing Jiaotong University, which enables AI to learn complex real-world tasks directly from video data without relying on language models [2][4]. Group 1: Model Overview - VideoWorld 2 is designed to learn complex, long-sequence real-world knowledge solely through video observation, distinguishing itself from existing models that depend on language or labeled data [4][5]. - The model can successfully perform intricate tasks such as origami and building with LEGO, which require fine-grained operations and long-term planning, achieving a success rate over 70% higher than current leading technologies like Sora 2, Veo 3, and Wan 2.2 [4][21]. Group 2: Learning Mechanism - The key to VideoWorld 2's learning capability lies in decoupling critical actions from irrelevant visual details, utilizing a dynamic enhanced latent dynamic model (dLDM) to improve learning efficiency and effectiveness [4][16]. - The model employs a MAGVITv2-style encoder-decoder structure and a pre-trained video diffusion model (VDM) to compress and render video changes, focusing on core dynamic actions while avoiding overfitting to irrelevant visual details [16][18]. Group 3: Experimental Setup - The team constructed two experimental environments: video handcrafting and video robot manipulation, to evaluate the model's ability to understand control rules and plan tasks [8][9]. - The handcrafting videos include various scenes with intricate actions and environmental changes, serving as an ideal testing ground for assessing the model's complex knowledge learning capabilities [8]. Group 4: Results and Visualization - The dLDM was shown to extract similar motion patterns from a large number of real-world videos, enhancing the model's ability to learn generalizable strategies [22][25]. - UMAP visualization demonstrated that VideoWorld 2 could better cluster similar actions across different environments compared to its predecessor, indicating improved extraction of commonalities and more generalized knowledge [25]. Group 5: Future Directions - The team believes that visual learning is crucial for advancing AI towards higher intelligence, aiming to develop models that can autonomously perceive, reason, and act based on complex real-world knowledge structures [26].