Workflow
VLM(视觉语言模型)
icon
Search documents
奔驰&图宾根联合新作!SpaceDrive:为自动驾驶VLA注入空间智能
自动驾驶之心· 2025-12-19 05:46
Core Insights - The article discusses the introduction of SpaceDrive, a new framework for autonomous driving that enhances spatial awareness in Vision-Language Models (VLMs) by integrating 3D positional encoding, addressing existing limitations in spatial reasoning and trajectory planning [3][4][31]. Group 1: Framework Overview - SpaceDrive replaces traditional VLM methods that treat coordinate values as text tokens with a unified 3D positional encoding, improving the system's spatial reasoning and trajectory planning capabilities [4][5]. - The framework demonstrates state-of-the-art (SOTA) performance in open-loop evaluations on the nuScenes dataset and ranks second in closed-loop evaluations on the Bench2Drive benchmark, achieving a driving score of 78.02 [3][21]. Group 2: Methodology - SpaceDrive employs a unified spatial interface that integrates visual tokens with 3D positional encoding, allowing for explicit spatial representation and improved accuracy in trajectory planning [5][6]. - The framework utilizes a regression decoder instead of a classification head for predicting trajectory coordinates, addressing the inherent limitations of language models in numerical processing [4][13]. Group 3: Experimental Results - In open-loop planning, SpaceDrive+ outperformed existing VLM-based methods, achieving an average L2 error of 0.32m and a collision rate of 0.23% [17][18]. - In closed-loop planning, SpaceDrive+ achieved a driving score of 78.02 and a success rate of 55.11%, ranking second among VLM-based methods [20][21]. Group 4: Contributions to the Field - SpaceDrive represents a paradigm shift from "language modeling geometry" to "explicit geometric encoding," effectively linking visual spatial perception with physical planning [31][33]. - The framework's introduction of a unified 3D positional encoding across perception, reasoning, and planning modules signifies a major architectural innovation, enhancing the generalizability of spatial intelligence [33].
2026年辅助驾驶将迎阵营洗牌?全新小鹏P7携VLA研发蓝图欲抢占先机
Zheng Quan Ri Bao Wang· 2025-08-29 10:49
Core Insights - The launch of the new XPeng P7 aims to position the vehicle among the top three in the pure electric sedan segment by 2026, with a focus on advanced technology and performance [1][2] - The company emphasizes the importance of the P7 as a "totem model," showcasing the highest level of technology and features across the entire lineup [1][2] - XPeng is investing heavily in intelligent driving technology, with nearly 5 billion yuan allocated for VLA (Visual Language Assistance) development this year, aiming for significant advancements by 2026 [2][3] Product Features - The new P7 offers a spacious interior, with rear knee room of 120mm, seat cushion length of 513mm, and a trunk capacity expandable to 1929L, balancing aesthetics and practicality [2] - The vehicle is equipped with the Ultra system and features high-end configurations like dual-chamber air suspension, reinforcing the brand's technological prowess [1][2] Pricing Strategy - The pricing strategy for the new P7 underwent multiple internal discussions, ultimately leading to a reduction in the price of the all-wheel-drive version to enhance its value proposition [2] - The company aims to exceed the previous model's sales of 230,000 units and achieve a milestone of 100,000 units produced more quickly [1][2] Intelligent Driving and Safety - XPeng's VLA aims to outperform current leading technologies by tenfold, with a focus on integrating fast response and strong reasoning capabilities [2] - The new P7 includes an OMS (Occupant Monitoring System) that prioritizes user privacy, featuring local data processing and physical privacy covers [3] Market Positioning - The new P7 is set to debut at the Munich Auto Show on September 3, symbolizing XPeng's long-term strategy in smart technology and product matrix, which is crucial for achieving profitability and solidifying market position in Q4 [3]