Prune2Drive框架

Search documents
告别高耗时!上交Prune2Drive:自动驾驶VLM裁剪利器,加速6倍性能保持
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the Prune2Drive framework developed by Shanghai Jiao Tong University and Shanghai AI Lab, which achieves a 6.4x acceleration in visual token processing while only reducing performance by 3% through a pruning method that eliminates 90% of visual tokens [2][3][25]. Group 1: Research Background and Challenges - Visual Language Models (VLMs) provide a unified framework for perception, reasoning, and decision-making in autonomous driving, enhancing scene understanding and reducing error propagation [2]. - The deployment of VLMs in real driving scenarios faces significant computational challenges due to the high-resolution images from multiple cameras, leading to increased inference latency and memory consumption [3]. - Existing token pruning methods are limited in adapting to multi-view scenarios, often neglecting spatial semantic diversity and the varying contributions of different camera views [4]. Group 2: Prune2Drive Framework - Prune2Drive introduces the Token-wise Farthest Point Sampling (T-FPS) mechanism, which maximizes the semantic and spatial coverage of multi-view tokens rather than relying solely on individual token significance [6]. - The T-FPS method uses cosine distance to measure semantic similarity between tokens, ensuring that selected tokens are non-redundant and semantically rich [10][11]. - A view-adaptive pruning controller is designed to optimize the pruning ratio for different views, allowing for efficient resource allocation based on the contribution of each view to driving decisions [11][12]. Group 3: Experimental Design and Results - Experiments were conducted on two multi-view VLM benchmark datasets (DriveLM, DriveLMM-o1) to validate the performance retention and efficiency improvement of Prune2Drive compared to baseline methods [16]. - The framework demonstrated that even with a 90% token reduction, it maintained a risk assessment accuracy of 68.34, outperforming several baseline models [22]. - The efficiency of Prune2Drive was highlighted by a significant speedup in processing, achieving a 6.4x acceleration in the DriveMM model and a 2.64x acceleration in the DriveLMM-o1 model [25]. Group 4: Key Findings and Advantages - Prune2Drive effectively captures critical information in driving scenarios, outperforming other methods by accurately identifying key objects in various views [26]. - The framework is plug-and-play, requiring no retraining of VLMs and compatible with efficient implementations like Flash Attention [31]. - It balances performance and efficiency, achieving substantial reductions in computational load while preserving essential semantic information [31].