生成式世界模型

Search documents
全球首款AI原生游戏引擎再进化:GTA6再不来,我们就AI一个
3 6 Ke· 2025-08-22 09:17
Core Insights - The article discusses the advancements in the AI-driven game engine, Mirage 2, which has evolved significantly from its predecessor, Mirage 1, in just over a month [2][4][17]. Group 1: Mirage 2 Features - Mirage 2 is described as a generative world engine that allows users to create, experience, and modify any interactive world, not limited to gaming [2][4]. - It supports image uploads to convert them into interactive game worlds and allows real-time dialogue for modifying the game environment through text commands [5][11]. - The engine has improved performance metrics, including faster prompt control, reduced game latency to 200ms, and the ability to run on a single consumer GPU [5][14][13]. Group 2: Comparison with Competitors - Mirage 2 is positioned to compete with DeepMind's Genie 3, offering more interactive capabilities such as running, jumping, and attacking, with a longer interaction horizon of over 10 minutes [11][13]. - The article highlights that Mirage 2 has made significant improvements in object proportions and scene understanding compared to Mirage 1, achieving a more realistic representation of characters and vehicles [14][17]. Group 3: Technical Challenges - Despite the advancements, there are still technical issues to address, such as action control precision and visual consistency during rapid scene changes [16][17]. - The article notes that while Mirage 2 has made strides, it still falls short of the consistency demonstrated by Genie 3, indicating areas for further development [16][17].
SceneDiffuser++:基于生成世界模型的城市规模交通仿真(CVPR'25)
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - The article discusses the development of SceneDiffuser++, a generative world model that enables city-scale traffic simulation, addressing the unique challenges of trip-level simulation compared to event-level simulation [1][2]. Group 1: Introduction and Background - The primary goal of traffic simulation is to supplement limited real-world driving data with extensive synthetic simulation mileage to support the testing and validation of autonomous driving systems [1]. - An ideal generative simulation city (CitySim) should seamlessly simulate a complete journey from point A to point B, managing dynamic elements such as vehicles, pedestrians, and traffic lights [1]. Group 2: Technical Integration - Achieving CitySim requires the integration of multiple technologies, including scene generation, agent behavior modeling, occlusion reasoning, dynamic scene generation, and environmental simulation [2]. - SceneDiffuser++ is the first end-to-end generative world model that consolidates these requirements through a single loss function, enabling complete simulation from A to B [2]. Group 3: Core Challenges and Innovations - Trip-level simulation faces three unique challenges compared to event-level simulation, including the need for dynamic agent management, occlusion reasoning, and environmental dynamics [3]. - SceneDiffuser++ introduces innovations such as multi-tensor diffusion, soft clipping strategies, and unified generative modeling to address these challenges [4][5]. Group 4: Methodology and Model Details - SceneDiffuser++ represents scenes as scene tensors, allowing the model to handle dynamic changes in heterogeneous elements like agents and traffic lights simultaneously [7]. - The model employs a diffusion process for training and inference, focusing on effective feature learning through loss masking and soft clipping to stabilize sparse tensor generation [8][9]. Group 5: Performance Evaluation - Experiments based on the WOMD-XLMap dataset demonstrate that SceneDiffuser++ outperforms previous models in all metrics, achieving lower Jensen-Shannon divergence values for agent generation and removal [12]. - The model maintains agent dynamics and traffic light realism over a 60-second simulation, contrasting with previous models that exhibited stagnation [15]. Group 6: Conclusion and Significance - The core contributions of SceneDiffuser++ include the introduction of the CitySim concept, the design of a unified generative framework, and the resolution of stability issues in dynamic scene generation through sparse tensor learning and soft clipping [19].