通用世界模型
Search documents
答应大家的《自动驾驶世界模型》课程终于开课了!
自动驾驶之心· 2026-01-06 06:52
Core Viewpoint - The article announces the launch of a new course titled "World Models and Autonomous Driving Small Class," focusing on general world models, video generation, and OCC generation algorithms in the context of autonomous driving [1][3]. Course Overview - The course is developed in collaboration with industry leaders and follows the success of a previous course on end-to-end and VLA autonomous driving [1]. - The course aims to enhance understanding of world models and their applications in autonomous driving, targeting individuals interested in entering the industry [11]. Course Structure Chapter 1: Introduction to World Models - This chapter provides an overview of world models and their connection to end-to-end autonomous driving, including historical development and current applications [6]. - It discusses various types of world models, such as pure simulation, simulation + planning, and generating sensor inputs and perception results, along with their industry applications [6]. Chapter 2: Background Knowledge of World Models - The second chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception [6][12]. - It highlights key technical terms frequently encountered in job interviews related to world models [7]. Chapter 3: Discussion on General World Models - This chapter focuses on popular general world models, including Marble from Li Fei-Fei's team, DeepMind's Genie 3, and Meta's JEPA, as well as the VLA+ world model algorithms [7]. - It aims to explain the core technologies and design philosophies behind these models [7]. Chapter 4: Video Generation-Based World Models - The fourth chapter delves into video generation algorithms, starting with Wayve's GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM [8]. - It balances classic works with the latest advancements in the field [8]. Chapter 5: OCC-Based World Models - This chapter focuses on OCC generation algorithms, discussing three major papers and a practical project that extends OCC methods to vehicle trajectory planning [9]. Chapter 6: World Model Job Topics - The final chapter shares practical insights from the instructor's years of experience, addressing industry applications, pain points, and interview preparation for related positions [10]. Learning Outcomes - The course is designed to be the first advanced practical tutorial for end-to-end autonomous driving, aiming to facilitate the implementation of these technologies in the industry [11]. - Participants are expected to achieve a level equivalent to one year of experience as a world model autonomous driving algorithm engineer upon completion [14].
Runway深夜炸场:一口气发布5大更新,首个通用世界模型来了
机器之心· 2025-12-12 04:31
Core Insights - Runway has made significant announcements, introducing five major updates that showcase its ambition in AI video and multimedia generation technology [1][3] - The updates indicate a shift from merely generating videos to simulating the physical world, marking a critical transition in the industry [4][34] Group 1: Gen-4.5 Video Generation Model - Gen-4.5 is the latest flagship video generation model, featuring impressive image quality and introducing native audio generation and editing capabilities [6][9] - The model achieves high physical accuracy and visual precision, with realistic movement of objects and fluid dynamics [9][10] - Gen-4.5 supports multi-shot editing, allowing users to modify initial scenes and apply changes throughout the entire video [14][15] - Despite its advancements, Runway acknowledges that Gen-4.5 still has common limitations found in video models, which are crucial for their world model research [15] Group 2: General World Model (GWM-1) - GWM-1 is Runway's first general world model, built on Gen-4.5, utilizing autoregressive methods for frame-by-frame predictions [18][19] - The model allows user intervention based on application scenarios, simulating future events in real-time [19] - GWM-1 includes three variants: GWM Worlds for environment simulation, GWM Avatars for interactive video generation, and GWM Robotics for training robots with synthetic data [21][22] Group 3: GWM Worlds - GWM Worlds enables real-time environment simulation, creating immersive and explorable spaces based on static scenes [23][24] - The model maintains spatial consistency during exploration, allowing for accurate responses to user-defined physical rules [24][25] Group 4: GWM Robotics - GWM Robotics supports counterfactual generation, exploring different robotic trajectories and outcomes [26][27] - It includes a Python SDK for generating videos based on robotic actions, enhancing training data without the need for expensive real-world data collection [28] Group 5: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates natural human movements and expressions [29][30] - The model has broad application potential, including personalized tutoring, customer support, training simulations, and interactive entertainment [31][32] Conclusion - Runway's updates signify a pivotal moment in the industry, transitioning from video generation to true world simulation, indicating a deeper understanding of the physical world's underlying logic [34][35]
工业界大佬带队!彻底搞懂自动驾驶世界模型...
自动驾驶之心· 2025-12-11 03:35
Core Viewpoint - The article introduces a new course titled "World Models and Autonomous Driving Small Class," focusing on advanced algorithms in the field of autonomous driving, including general world models, video generation, and OCC generation [1][3]. Course Overview - The course is developed in collaboration with industry leaders and follows the success of a previous course on end-to-end and VLA autonomous driving [1]. - The course aims to enhance understanding and practical skills in world models, targeting individuals interested in the autonomous driving industry [11]. Course Structure - **Chapter 1: Introduction to World Models** - Discusses the relationship between world models and end-to-end autonomous driving, including historical development and current applications [6]. - Covers various types of world models, such as pure simulation, simulation + planning, and generation of sensor inputs and perception results [6]. - **Chapter 2: Background Knowledge of World Models** - Focuses on foundational knowledge, including scene representation, Transformer, and BEV perception [6][12]. - Highlights key technical terms frequently encountered in job interviews related to world models [7]. - **Chapter 3: General World Model Exploration** - Examines popular models like Marble from Li Fei-Fei's team, DeepMind's Genie 3, and Meta's JEPA, along with recent discussions on VLA + world model algorithms [7]. - **Chapter 4: Video Generation-Based World Models** - Concentrates on video generation algorithms, starting with Wayve's GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM [8]. - **Chapter 5: OCC-Based World Models** - Focuses on OCC generation methods, discussing three major papers and a practical project that extends to vehicle trajectory planning [9]. - **Chapter 6: World Model Job Specialization** - Provides insights into the application of world models in the industry, addressing pain points and interview preparation for relevant positions [10]. Learning Outcomes - The course aims to equip participants with the skills to reach a level equivalent to one year of experience as a world model autonomous driving algorithm engineer [14]. - Participants will gain a comprehensive understanding of world model technologies, including video generation and OCC generation methods, and will be able to apply their knowledge in practical projects [14].
AI 能造世界了?谷歌 DeepMind 的 Genie 3 分秒生成《死亡搁浅》
3 6 Ke· 2025-08-06 11:29
Core Insights - DeepMind has launched Genie 3, a new model referred to as a "general world model," which allows users to create and interact with 3D environments based on text prompts, marking a significant advancement in generative AI technology [2][5][20] Group 1: Technological Advancements - Genie 3 has improved from its predecessor, Genie 2, achieving a resolution increase from 360p to 720p and maintaining continuous simulations for several minutes instead of just 10 to 20 seconds [3][18] - The model introduces a new visual memory mechanism that allows it to maintain scene consistency, meaning objects and environments remain stable and logical over time [4][9] - Genie 3 can dynamically adjust scenes in response to user inputs, allowing for real-time interaction and exploration, which is a significant leap from traditional video generation models [8][10] Group 2: Applications in Various Industries - The gaming industry stands to benefit greatly, as Genie 3 can drastically reduce the time and cost associated with creating 3D environments, enabling independent developers to create complex scenes with simple text prompts [10][12] - In the film industry, directors and artists can use Genie 3 to preview and adjust scenes in real-time, enhancing the creative process [12][21] - The educational sector can leverage Genie 3 to create interactive and explorable representations of historical and geographical concepts, transforming traditional learning methods [12][21] Group 3: Future Implications - Genie 3 serves as a cognitive training ground for AI agents, allowing them to learn cause-and-effect relationships and spatial awareness in a controlled virtual environment, which could enhance their real-world applications [17][20] - The model represents a significant shift in AI technology, moving from 2D to 3D and towards interactive, causally consistent environments, indicating a clear trajectory for future developments in AI spatial intelligence [20][21] - While Genie 3 is not yet publicly available, its development reflects a broader trend in AI towards creating operable virtual spaces from textual descriptions, potentially revolutionizing various fields [20][21]