FastDriveVLA
Search documents
AAAI 2026 | 小鹏联合北大,专为VLA模型定制视觉token剪枝方法
具身智能之心· 2026-01-05 01:03
实验结果显示, 在不同剪枝比例下,FastDriveVLA 在 nuScenes 开环规划基准测试中均取得了 SOTA 性能 。FastDriveVLA 也非常高效,当视觉 token 数量从 3249 减少至 812 时,FastDriveVLA 的 FLOPs 直降约 7.5 倍;在 CUDA 推理延迟方面,FastDriveVLA 将预填充(prefill)时间减少了 3.7 倍、将解码(decode)时间减少 了 1.3 倍,显著提升了推理效率。 该篇论文被 AAAI 2026 录用。 编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! VLA 模型正被越来越多地应用于端到端自动驾驶系统中。然而,VLA 模型中冗长的视觉 token 极大地增加了计算成本。但现有的视觉 token 剪枝方法都不是专为自 动驾驶设计的,在自动驾驶场景中都具有局限性。 小鹏汽车联合北京大学计算机科学学院多媒体信息处理国家重点实验室发表论文 《FastDrive ...
AAAI 2026 | 小鹏联合北大,专为VLA模型定制视觉token剪枝方法,让端到端自动驾驶更高效
机器之心· 2026-01-04 05:43
VLA 模型正被越来越多地应用于端到端自动驾驶系统中。然而,VLA 模型中冗长的视觉 token 极大地增加了计算成本。但现有的视觉 token 剪枝方法都不是专为自 动驾驶设计的,在自动驾驶场景中都具有局限性。 小鹏汽车联合北京大学计算机科学学院多媒体信息处理国家重点实验室发表论文 《FastDriveVLA》 ,不仅为自动驾驶 VLA 模型中的高效视觉 token 剪枝建立了新 的范式,也为特定任务的剪枝策略提供了有价值的洞察。 受人类驾驶员主要关注前景区域而非背景区域的启发,研究团队做出假设:对于自动驾驶而言,与前景信息相关的视觉 token 比与背景内容相关的视觉 token 更有价 值。为了验证这个假设,研究团队构建了大规模自动驾驶标注数据集 nuScenes-FG (包含来自 6 个摄像头视角的、带有前景区域标注的 24.1 万个图像 - 掩码对), 通过 MAE 风格的 像素重建 策略 和新颖的 对抗性前景 - 背景重建 策略,训练出了一个适用于不同 VLA 模型的、可以即插即用的 视觉 token 剪枝器 ReconPruner 实验结果显示, 在不同剪枝比例下,FastDriveVLA 在 ...
【周观点】小鹏联合北大发布FastDriveVLA,继续看好汽车板块
东吴汽车黄细里团队· 2025-12-29 12:54
Investment Highlights - The automotive sector has shown positive performance this week, with the SW passenger vehicle and SW auto parts sectors leading with gains of +3.3% each, followed by SW automotive (+2.7%) and SW commercial vehicles (+1.1%), while SW commercial passenger vehicles declined by -2.2% [4][12] - The top-performing stocks this week include Yatong Co., Hengshuai Co., Xusheng Group, Yinlun Co., and Shuanghuan Transmission [4][12] Research Outcomes - The team has released a strategy report for automotive parts for 2026 and a monthly report on buses [5][12] Industry Core Changes 1. Xiaopeng Motors collaborated with Peking University to publish a paper at the international AI conference AAAI 2026, addressing the core contradiction of high computational load and precise decision-making in autonomous driving, showcasing both technological breakthroughs and commercial feasibility [6][12] 2. Horizon Robotics launched the S100P mass production of the Digua robot and officially released the Vbot super robot dog [6][12] 3. The Zhiji brand is expected to achieve full-cost profitability for the first time by December 2025 [6][12] Current Automotive Sector Configuration - The automotive industry is entering a new crossroads phase, with the end of the electric vehicle (EV) dividend and the dawn of automotive intelligence, while robotics innovation is in the 0-1 stage. Three main investment opportunities are emerging during this transition [8][13] - **AI Smart Vehicle Main Line**: Focus on Robotaxi/van, with downstream application core targets including: - **Robotaxi Perspective**: Integrated models like Tesla, Xiaopeng Motors, and Qianli Technology; technology providers with operational sharing models like Horizon Robotics and Baidu; transformation of ride-hailing/taxi services with companies like Cao Cao Mobility, Didi, and others [8][13] - **Robovan Perspective**: Key focus on Desay SV and others [8][13] - **C-end Vehicle Sales Perspective**: Whole vehicles from Xiaopeng Motors, Li Auto, Huawei, Xiaomi, etc. [8][13] - **Upstream Supply Chain Core Targets**: B-end vehicle manufacturing by companies like BAIC Blue Valley, GAC Group, and SAIC Group; core suppliers in testing, chips, domain controllers, sensors, and more [8][13] - **AI Robotics Main Line**: Preferred components from Top Group, Junsheng Electronics, Xinquan Technology, and others [8][13] - **Dividend & Good Pattern Main Line**: Focus on buses (Yutong Bus), heavy trucks (China National Heavy Duty Truck Group, Weichai Power), and two-wheelers (Chunfeng Power, Longxin General) [9][13]
汽车周观点:小鹏联合北大发布FastDriveVLA,继续看好汽车板块-20251229
Soochow Securities· 2025-12-29 11:09
证券研究报告 汽车周观点: 小鹏联合北大发布FastDriveVLA,继续看好汽车板块 证券分析师 :黄细里 执业证书编号:S0600520010001 联系邮箱:huangxl@dwzq.com.cn 2025年12月29日 请务必阅读正文之后的免责声明部分 核心结论 注:若无特殊说明,"本周"均代表2025.12.22-2025.12.28 2 ◼ 本周细分板块涨跌幅排序:SW乘用车(+3.3%) =SW汽车零部件(+3.3%)>SW汽 车(+2.7%) >SW商用载货车(+1.1%) > SW商用载客车(-2.2%) 。本周已覆盖标的亚 太股份、恒帅股份、旭升集团、银轮股份、双环传动涨幅前五。 ( ◼ 本周团队研究成果:外发2026汽车零部件策略报告、客车月报。 ◼ 本周行业核心变化:1)小鹏联合北大发布论文登国际AI顶会AAAI 2026,解决自动 驾驶 VLA 模型计算负荷大与精准决策的核心矛盾,成果兼具技术突破性与商业落地 可行性;2)地平线地瓜机器人S100P量产首发,Vbot超能机器狗正式发布;3)智 己品牌在2025年12月首次实现全成本口径盈利。 板块最新观点 3 ◼ Q4重视AI智能车投 ...
XPENG-Peking University Collaborative Research Accepted by AAAI 2026: Introducing a Novel Visual Token Pruning Framework for Autonomous Driving
Prnewswire· 2025-12-29 05:35
Core Insights - XPENG, in collaboration with Peking University, has developed FastDriveVLA, a novel visual token pruning framework for autonomous driving AI, which has been accepted at AAAI 2026, a prestigious AI conference with an acceptance rate of 17.6% [1][10]. Technology Development - FastDriveVLA focuses on efficient visual token pruning, allowing AI to prioritize essential visual information while filtering out irrelevant data, thereby enabling autonomous driving systems to "drive like a human" [2][4]. - The framework employs an adversarial foreground-background reconstruction strategy to enhance the model's ability to retain valuable tokens, achieving a significant reduction in computational load [5]. Performance Metrics - On the nuScenes autonomous driving benchmark, FastDriveVLA demonstrated state-of-the-art performance, achieving a nearly 7.5x reduction in computational load when visual tokens were reduced from 3,249 to 812, while maintaining high planning accuracy [5]. Industry Recognition - This marks the second recognition for XPENG at a top-tier global AI conference in 2025, following its participation in CVPR WAD, where it presented advancements in autonomous driving foundation models [6]. - XPENG's commitment to achieving L4 level autonomous driving is underscored by its full-stack in-house capabilities, which encompass model architecture design, training, and vehicle deployment [7]. Company Overview - XPENG is positioned as a leader in future mobility transformation, with R&D centers across China and a global strategy for research, development, and sales, including a presence in the United States and Europe [8][9].
小鹏联合北大提出全新视觉Token剪枝框架,何小鹏:在探索L4路上又取得新突破
Xin Lang Cai Jing· 2025-12-28 07:56
新浪科技讯 12月28日下午消息,近日,人工智能领域国际会议AAAI 2026公布了论文录用结果,由小鹏 汽车和北京大学计算机学院多媒体信息处理全国重点实验室联合完成的论文《FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning》成功入选。这篇论文最大的贡 献在于,提出了一种专门为端到端自动驾驶VLA模型定制的、高效的视觉Token剪枝框架—— FastDriveVLA。 据介绍,FastDriveVLA包含一个即插即用的视觉Token剪枝器ReconPruner。在车端模型的推理阶段, ReconPruner可直接嵌入自动驾驶VLA模型用于视觉Token的剪枝,即插即用,无需重新训练整个模型。 为了辅助该剪枝器的训练,还专门构建了包含来自6个摄像头视角的24.1万个图像-掩码对的nuScenes-FG 数据集。这一大规模的自动驾驶前景分割标注数据集,可广泛用于未来自动驾驶的研究。 最终,nuScenes自动驾驶数据集上的测试显示,采用这一剪枝框架,在不同剪枝率下均取得当前最 ...
面向量产VLA!FastDriveVLA:即插即用剪枝模块,推理加速近4倍
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel visual token pruning framework designed for autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [3][13][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of visual-language-action (VLA) models, which outperform traditional modular approaches in complex scene understanding and decision-making [3][10]. - The VLA model integrates perception, action generation, and planning into a single framework, reducing information loss between modules [3][4]. Group 2: Visual Token Pruning Techniques - Existing VLM/VLA models face high computational costs due to the encoding of images into numerous visual tokens, prompting research into visual token pruning methods [4][11]. - Two primary approaches for visual token pruning are attention mechanism-based methods and similarity-based methods, both of which have limitations in driving tasks [4][14]. - FastDriveVLA introduces a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground areas critical for driving decisions [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA employs a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to emphasize foreground information [6][17]. - The framework includes an adversarial foreground-background reconstruction strategy to enhance the model's ability to distinguish between foreground and background tokens [20][21]. - A large-scale dataset, nuScenes-FG, was constructed to support the training of ReconPruner, containing 241,000 image-mask pairs for effective foreground segmentation [6][12][13]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][28]. - The framework was evaluated under various pruning ratios (25%, 50%, 75%), consistently outperforming existing methods in key metrics such as L2 error and collision rates [30][34]. - The efficiency analysis showed that FastDriveVLA significantly reduces FLOPs and CUDA latency compared to other methods, enhancing real-time deployment capabilities [36][40]. Group 5: Contributions and Implications - The introduction of FastDriveVLA provides a new paradigm for efficient inference in VLA models, offering insights into task-specific token pruning strategies [43]. - The research highlights the importance of focusing on foreground information in autonomous driving tasks, which can lead to improved performance and reduced computational costs [5][43].
自动驾驶论文速递 | 扩散模型、轨迹预测、TopoLiDM、VLA等~
自动驾驶之心· 2025-08-05 03:09
Core Insights - The article discusses advancements in trajectory prediction using a generative active learning framework called GALTraj, which applies controllable diffusion models to address long-tail issues in data [1][2]. Group 1: GALTraj Framework - GALTraj is the first framework to apply generative active learning to trajectory prediction tasks, enhancing long-tail learning without modifying the model structure [2]. - The framework employs a tail-aware generation method that differentiates the diffusion guidance for tail, head, and related agents, producing realistic and diverse scenarios while preserving tail characteristics [2][3]. Group 2: Experimental Results - In experiments on WOMD and Argoverse2 datasets, GALTraj significantly improved long-tail sample prediction performance, reducing the long-tail metric FPR₅ by 47.6% (from 0.42 to 0.22) and overall prediction error minFDE₆ by 14.7% (from 0.654 to 0.558) [1][6]. - The results indicate that GALTraj outperforms traditional methods across various metrics, showcasing its effectiveness in enhancing prediction accuracy for rare scenarios [7][8]. Group 3: TopoLiDM Framework - The article also highlights the TopoLiDM framework developed by Shanghai Jiao Tong University and Twente University, which integrates topology-aware diffusion models for high-fidelity LiDAR point cloud generation [13][15]. - TopoLiDM achieved a 22.6% reduction in the Fréchet Range Image Distance (FRID) and a 9.2% reduction in Minimum Matching Distance (MMD) on the KITTI-360 dataset while maintaining a real-time generation speed of 1.68 samples per second [13][15]. Group 4: FastDriveVLA Framework - FastDriveVLA, developed by Peking University and Xiaopeng Motors, introduces a reconstruction-based visual token pruning framework that maintains 99.1% trajectory accuracy with a 50% pruning rate and reduces collision rates by 2.7% [21][22]. - The framework employs a novel adversarial foreground-background reconstruction strategy to enhance the identification of valuable tokens, achieving state-of-the-art performance on the nuScenes open-loop planning benchmark [27][28]. Group 5: PLA Framework - The article presents a unified Perception-Language-Action (PLA) framework proposed by TUM, which integrates multi-sensor fusion and GPT-4.1 enhanced visual-language-action reasoning for adaptive autonomous driving [34][35]. - The framework demonstrated a mean absolute error (MAE) of 0.39 m/s in speed prediction and an average displacement error (ADE) of 1.013 meters in trajectory tracking within urban intersection scenarios [42].
面向量产VLA方案!FastDriveVLA:即插即用剪枝模块,推理加速近4倍(北大&小鹏)
自动驾驶之心· 2025-08-04 23:33
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel framework for visual token pruning in autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [2][3][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of end-to-end methods that complete perception to planning in a single model, reducing information loss between modules [3]. - The introduction of Visual-Language-Action (VLA) models enhances decision-making in complex scenarios, making them increasingly popular in autonomous driving systems [3][10]. Group 2: Visual Token Pruning - Existing VLM/VLA models encode images into numerous visual tokens, resulting in high computational costs. Current research explores two main directions for visual token pruning: attention mechanism-based methods and similarity-based methods [4][14]. - FastDriveVLA proposes a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground information, significantly reducing computational costs while maintaining performance [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA includes a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to focus on foreground areas and assign higher significance scores to key tokens [6][17]. - The framework utilizes a large-scale dataset, nuScenes-FG, containing 241,000 image-mask pairs for training, enhancing the model's ability to distinguish between foreground and background [6][12]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][34]. - The framework shows superior performance compared to existing methods, with improvements in L2 error and collision rates at various pruning ratios [30][34]. Group 5: Efficiency Analysis - FastDriveVLA significantly reduces FLOPs by approximately 7.5 times and decreases prefill and decode latencies, enhancing inference efficiency for real-time deployment [36][40]. - The lightweight design of ReconPruner allows for lower CUDA latency compared to several similar methods, making it suitable for practical applications [36][40].