Workflow
CoopTrack
icon
Search documents
自动驾驶论文速递 | GS-Occ3D、BEV-LLM、协同感知、强化学习等~
自动驾驶之心· 2025-07-30 03:01
Group 1 - The article discusses recent advancements in autonomous driving technologies, highlighting several innovative frameworks and models [3][9][21][33][45] - GS-Occ3D achieves state-of-the-art (SOTA) geometric accuracy with a 0.56 corner distance (CD) on the Waymo dataset, demonstrating superior performance over LiDAR-based methods [3][5] - BEV-LLM introduces a lightweight multimodal scene description model that outperforms existing models by 5% in BLEU-4 score, showcasing the integration of LiDAR and multi-view images [9][10] - CoopTrack presents an end-to-end cooperative perception framework that sets new SOTA performance on the V2X-Seq dataset with 39.0% mAP and 32.8% AMOTA [21][22] - The Diffusion-FS model achieves a 0.7767 IoU in free-space prediction, marking a significant improvement in multimodal driving channel prediction [45][48] Group 2 - GS-Occ3D's contributions include a scalable visual occupancy label generation pipeline that eliminates reliance on LiDAR annotations, enhancing the training efficiency for downstream models [5][6] - BEV-LLM utilizes BEVFusion to combine 360-degree panoramic images with LiDAR point clouds, improving the accuracy of scene descriptions [10][12] - CoopTrack's innovative instance-level end-to-end framework integrates cooperative tracking and perception, enhancing the learning capabilities across agents [22][26] - The ContourDiff model introduces a novel self-supervised method for generating free-space samples, reducing dependency on dense annotated data [48][49]
清华提出CoopTrack:端到端协同跟踪新方案(ICCV'25 Highlight)
自动驾驶之心· 2025-07-28 10:41
Core Viewpoint - The article discusses the introduction of CoopTrack, a novel end-to-end collaborative tracking framework aimed at enhancing 3D multi-object tracking through cooperative perception among multiple agents, addressing the limitations of traditional single-agent systems [2][4]. Innovations - A new end-to-end framework: CoopTrack is the first framework designed for collaborative 3D multi-object tracking (3D MOT), integrating collaborative perception with sequential tracking tasks, thus overcoming the information fragmentation seen in traditional tracking-by-cooperative-detection paradigms [6]. - Learnable instance association module: This module replaces prior methods based on Euclidean distance with a graph-based attention mechanism, allowing for a more robust and adaptive association by learning the similarity between instance features across agents [6]. - Novel "Fusion-After-Decoding" pipeline: Unlike mainstream methods, CoopTrack employs a new paradigm of decoding first, associating next, and then fusing, which helps avoid ambiguities and conflicts during feature fusion [9]. - Multi-Dimensional Feature Extraction (MDFE): The MDFE module decouples instance representation into semantic and motion features, enhancing the information available for precise associations [9]. Algorithm Overview - The core process of CoopTrack includes: 1. Multi-Dimensional Feature Extraction (MDFE): Each agent generates rough 3D boxes and updated queries using image features and a transformer decoder, extracting semantic features through MLP and motion features via PointNet [10][13]. 2. Cross-Agent Alignment (CAA): This module addresses feature domain gaps caused by differences in sensors and perspectives by learning a hidden rotation matrix and translation vector [13]. 3. Graph-Based Association (GBA): A fully connected association graph is constructed, where nodes represent aligned multi-dimensional features, and edges represent distances between vehicle and roadside instances, calculated using a graph attention mechanism [17]. Experimental Results - CoopTrack demonstrated superior performance on the V2X-Seq and Griffin datasets, achieving state-of-the-art (SOTA) results with a mean Average Precision (mAP) of 39.0% and Average Multi-Object Tracking Accuracy (AMOTA) of 32.8% [2][16]. - Comparative performance metrics show that CoopTrack outperforms other methods, with mAP of 0.479 and AMOTA of 0.488, while maintaining a lower communication cost compared to early fusion methods [15].