ReSim
Search documents
深扒了学术界和工业界的「空间智能」,更多的还停留在表层......
自动驾驶之心· 2025-12-28 03:30
Core Viewpoint - The article emphasizes the transition of autonomous driving from "perception-driven" to "spatial intelligence" by 2025, highlighting the importance of understanding and interacting with the three-dimensional physical world [3]. Group 1: Spatial Intelligence Definition - Spatial intelligence is defined as the ability to perceive, represent, reason, decide, and interact with spatial information, which is crucial for the interaction between intelligent agents and the physical world [3]. - Current spatial intelligence is primarily focused on perception and representation, with significant room for improvement in reasoning, decision-making, and interaction capabilities [3]. Group 2: World Models and Simulation - GAIA-2 is a multi-view generative world model for autonomous driving that generates driving videos based on physical laws and conditions, addressing edge cases in driving scenarios [5]. - GAIA-3 enhances GAIA-2 by increasing the scale fivefold and capturing fine-grained spatiotemporal contexts, representing the physical causal structure of the real world [9]. - ReSim combines expert trajectories from the real world with simulated dangerous behaviors to achieve high-fidelity simulations of extreme driving scenarios [11]. Group 3: Multimodal Reasoning - The SIG framework introduces a structured graph scheme that encodes scene layouts and object relationships, aiming to enhance geometric reasoning in autonomous driving [16]. - OmniDrive generates a large-scale 3D question-answer dataset to align visual language models with 3D spatial understanding and planning [19]. - SimLingo addresses the alignment of driving behavior with semantic instructions through an action dreaming task, demonstrating the potential of general models in real-time decision-making [21]. Group 4: Real-time Digital Twins - DrivingRecon is a 4D Gaussian reconstruction model that predicts parameters from surround-view videos, enabling efficient dynamic scene reconstruction for autonomous driving [26]. - VR-Drive enhances robustness in driving systems by allowing real-time prediction of new viewpoints without scene optimization [29]. Group 5: Embodied Fusion - MiMo-Embodied is the first open-source cross-embodied model that integrates autonomous driving with embodied intelligence, showcasing significant transfer effects in spatial reasoning capabilities [31]. - DriveGPT4-V2 is a closed-loop end-to-end autonomous driving framework that outputs low-level control signals, evolving from visual understanding to closed-loop control [36]. Group 6: Industry Trends - By 2025, the industry is moving towards an end-to-end VLA architecture, leveraging large language models for driving decision-making [40]. - Waymo's EMMA model integrates multimodal inputs and outputs in a unified language space, enhancing complex reasoning in driving tasks [41]. - DeepRoute.ai's DeepRoute IO 2.0 architecture introduces chain-of-thought reasoning to address the "black box" issue in end-to-end models, improving user trust in autonomous systems [44].