CVPR 2025 Tutorial：从视频生成到世界模型 | MMLab@NTU团队&快手可灵等联合呈现

Core Insights - Video generation technology has evolved from simple animations to high-quality dynamic content capable of storytelling and long-term reasoning [1] - The advancements in models like 可灵, Sora, Genie, Cosmos, and Movie Gen are expanding the boundaries of video generation, prompting researchers to explore deeper questions about its potential as a bridge to world models and its role in embodied intelligence [2][6] Group 1: Video Generation and Its Implications - Video generation is being recognized as a powerful visual prior that can enhance AI's perception of the world, understanding interactions, and reasoning about physics, leading towards more general and embodied intelligent world models [3] - The tutorial at CVPR 2025 will feature leading researchers from academia and industry discussing how generative capabilities can be transformed into a foundation for perception, prediction, and decision-making [4] Group 2: Tutorial Details - The CVPR 2025 tutorial is scheduled for June 11, 2025, at the Music City Center in Nashville, TN, focusing on the transition from video generation to understanding and modeling the real world [9] - The agenda includes various invited talks from experts in the field, covering topics such as scaling world models, physics-grounded models, and advancements in video generation [5] Group 3: Future Directions - The development of video generation models suggests potential for understanding interactions between objects and capturing the physical and semantic causality behind human behavior, indicating a shift from mere generation to interactive world modeling [6] - The tutorial aims to provide insights, tools, and future research directions for those interested in video generation, multimodal understanding, embodied AI, and physical reasoning [7]