清华提出CoopTrack：端到端协同跟踪新方案（ICCV'25 Highlight）

Core Viewpoint - The article discusses the introduction of CoopTrack, a novel end-to-end collaborative tracking framework aimed at enhancing 3D multi-object tracking through cooperative perception among multiple agents, addressing the limitations of traditional single-agent systems [2][4]. Innovations - A new end-to-end framework: CoopTrack is the first framework designed for collaborative 3D multi-object tracking (3D MOT), integrating collaborative perception with sequential tracking tasks, thus overcoming the information fragmentation seen in traditional tracking-by-cooperative-detection paradigms [6]. - Learnable instance association module: This module replaces prior methods based on Euclidean distance with a graph-based attention mechanism, allowing for a more robust and adaptive association by learning the similarity between instance features across agents [6]. - Novel "Fusion-After-Decoding" pipeline: Unlike mainstream methods, CoopTrack employs a new paradigm of decoding first, associating next, and then fusing, which helps avoid ambiguities and conflicts during feature fusion [9]. - Multi-Dimensional Feature Extraction (MDFE): The MDFE module decouples instance representation into semantic and motion features, enhancing the information available for precise associations [9]. Algorithm Overview - The core process of CoopTrack includes: 1. Multi-Dimensional Feature Extraction (MDFE): Each agent generates rough 3D boxes and updated queries using image features and a transformer decoder, extracting semantic features through MLP and motion features via PointNet [10][13]. 2. Cross-Agent Alignment (CAA): This module addresses feature domain gaps caused by differences in sensors and perspectives by learning a hidden rotation matrix and translation vector [13]. 3. Graph-Based Association (GBA): A fully connected association graph is constructed, where nodes represent aligned multi-dimensional features, and edges represent distances between vehicle and roadside instances, calculated using a graph attention mechanism [17]. Experimental Results - CoopTrack demonstrated superior performance on the V2X-Seq and Griffin datasets, achieving state-of-the-art (SOTA) results with a mean Average Precision (mAP) of 39.0% and Average Multi-Object Tracking Accuracy (AMOTA) of 32.8% [2][16]. - Comparative performance metrics show that CoopTrack outperforms other methods, with mAP of 0.479 and AMOTA of 0.488, while maintaining a lower communication cost compared to early fusion methods [15].