全面超越DiffusionDrive！中科大GMF-Drive：全球首个Mamba端到端SOTA方案

Core Viewpoint - The article discusses the GMF-Drive framework developed by the University of Science and Technology of China, which addresses the limitations of existing multi-modal fusion architectures in end-to-end autonomous driving by integrating gated Mamba fusion with spatial-aware BEV representation [2][7]. Summary by Sections End-to-End Autonomous Driving - End-to-end autonomous driving has gained recognition as a viable solution, directly mapping raw sensor inputs to driving actions, thus minimizing reliance on intermediate representations and information loss [2]. - Recent models like DiffusionDrive and GoalFlow have demonstrated strong capabilities in generating diverse and high-quality driving trajectories [2][8]. Multi-Modal Fusion Challenges - A key bottleneck in current systems is the multi-modal fusion architecture, which struggles to effectively integrate heterogeneous inputs from different sensors [3]. - Existing methods, primarily based on the TransFuser style, often result in limited performance improvements, indicating a simplistic feature concatenation rather than structured information integration [5]. GMF-Drive Framework - GMF-Drive consists of three modules: a data preprocessing module that enhances geometric information, a perception module utilizing a spatial-aware state space model (SSM), and a trajectory planning module employing a truncated diffusion strategy [7][13]. - The framework aims to retain critical 3D geometric features while improving computational efficiency compared to traditional transformer-based methods [11][16]. Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM dataset, outperforming the previous best model, DiffusionDrive, by 0.8 points [32]. - The framework demonstrated significant improvements in key metrics, including a 1.1 point increase in the driving area compliance score (DAC) and a maximum score of 83.3 in the ego vehicle progression (EP) [32][34]. Component Analysis - The study conducted ablation experiments to assess the contributions of various components, confirming that the integration of geometric representations and the GM-Fusion architecture is crucial for optimal performance [39][40]. - The GM-Fusion module, which includes gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, significantly enhances the model's ability to process multi-modal data effectively [22][44]. Conclusion - GMF-Drive represents a novel end-to-end autonomous driving framework that effectively combines geometric-enhanced pillar representation with a spatial-aware fusion model, achieving superior performance compared to existing transformer-based architectures [51].