华科&港大提出UniLION：基于线性组 RNN 的统一自动驾驶模型

Core Viewpoint - UniLION is a groundbreaking unified autonomous driving framework developed by the University of Hong Kong, Huazhong University of Science and Technology, and Baidu, which effectively addresses computational efficiency issues in processing large-scale point cloud data and multi-view images using linear group RNN technology [2][3]. Group 1: Project Overview - UniLION is designed to efficiently handle large-scale LiDAR point clouds, high-resolution multi-view images, and temporal data without the need for explicit temporal or multi-modal fusion modules, supporting various configurations seamlessly [4][5]. - The framework aims to simplify the design of multi-modal and multi-task autonomous driving systems while maintaining superior performance across core tasks such as 3D perception, prediction, and planning [3][44]. Group 2: Research Background and Challenges - Current autonomous driving systems face challenges in computational efficiency, multi-modal fusion complexity, temporal information processing, and multi-task learning difficulties [5]. - Traditional Transformer models introduce significant computational overhead due to their quadratic complexity in attention mechanisms when processing long sequences [5]. Group 3: Innovations of UniLION - UniLION features a unified 3D backbone network based on linear group RNN, allowing seamless processing of different modalities and temporal information without explicit fusion modules [8]. - The framework utilizes linear computational complexity to convert multi-view images, LiDAR point clouds, and temporal information into tokens for unified integration in 3D space [8]. - UniLION generates a compact unified bird's-eye view (BEV) representation of heterogeneous multi-modal information and time series, serving as shared features for various downstream tasks [8]. Group 4: Performance Results - UniLION demonstrated competitive and state-of-the-art performance on the nuScenes dataset, achieving 74.9% NDS and 72.2% mAP in 3D object detection, 76.2% AMOTA in multi-object tracking, and 72.3% mIoU in BEV map segmentation [20]. - The strongest temporal multi-modal version of UniLION achieved 75.4% NDS and 73.2% mAP in detection tasks, showcasing its advanced capabilities across multiple evaluation tasks [20]. Group 5: Efficiency and Robustness - UniLION significantly reduces computational resource requirements and inference time through its linear computational complexity, making it suitable for deployment in real-world autonomous driving systems [35]. - The framework exhibits strong robustness against sensor misalignment, maintaining performance even under high misalignment levels [32]. Group 6: Future Prospects - Future work includes expanding UniLION to support additional sensor modalities, applying it in real-world autonomous driving systems, and exploring large-scale pre-training to enhance its generalization capabilities [45].