Workflow
SpaceDrive
icon
Search documents
奔驰&图宾根联合新作!SpaceDrive:为自动驾驶VLA注入空间智能
自动驾驶之心· 2025-12-19 05:46
Core Insights - The article discusses the introduction of SpaceDrive, a new framework for autonomous driving that enhances spatial awareness in Vision-Language Models (VLMs) by integrating 3D positional encoding, addressing existing limitations in spatial reasoning and trajectory planning [3][4][31]. Group 1: Framework Overview - SpaceDrive replaces traditional VLM methods that treat coordinate values as text tokens with a unified 3D positional encoding, improving the system's spatial reasoning and trajectory planning capabilities [4][5]. - The framework demonstrates state-of-the-art (SOTA) performance in open-loop evaluations on the nuScenes dataset and ranks second in closed-loop evaluations on the Bench2Drive benchmark, achieving a driving score of 78.02 [3][21]. Group 2: Methodology - SpaceDrive employs a unified spatial interface that integrates visual tokens with 3D positional encoding, allowing for explicit spatial representation and improved accuracy in trajectory planning [5][6]. - The framework utilizes a regression decoder instead of a classification head for predicting trajectory coordinates, addressing the inherent limitations of language models in numerical processing [4][13]. Group 3: Experimental Results - In open-loop planning, SpaceDrive+ outperformed existing VLM-based methods, achieving an average L2 error of 0.32m and a collision rate of 0.23% [17][18]. - In closed-loop planning, SpaceDrive+ achieved a driving score of 78.02 and a success rate of 55.11%, ranking second among VLM-based methods [20][21]. Group 4: Contributions to the Field - SpaceDrive represents a paradigm shift from "language modeling geometry" to "explicit geometric encoding," effectively linking visual spatial perception with physical planning [31][33]. - The framework's introduction of a unified 3D positional encoding across perception, reasoning, and planning modules signifies a major architectural innovation, enhancing the generalizability of spatial intelligence [33].