Workflow
Swin Transformer
icon
Search documents
性能暴涨4%!CBDES MoE:MoE焕发BEV第二春,性能直接SOTA(清华&帝国理工)
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article discusses the CBDES MoE framework, a novel modular expert mixture architecture designed for BEV perception in autonomous driving, addressing challenges in adaptability, modeling capacity, and generalization in existing methods [2][5][48]. Group 1: Introduction and Background - The rapid development of autonomous driving technology has made 3D perception essential for building safe and reliable driving systems [5]. - Existing solutions often use fixed single backbone feature extractors, limiting adaptability to diverse driving environments [5][6]. - The MoE paradigm offers a new solution by enabling dynamic expert selection based on learned routing mechanisms, balancing computational efficiency and representational richness [6][9]. Group 2: CBDES MoE Framework - CBDES MoE integrates multiple structurally heterogeneous expert networks and employs a lightweight self-attention router (SAR) for dynamic expert path selection [3][12]. - The framework includes a multi-stage heterogeneous backbone design pool, enhancing scene adaptability and feature representation [14][17]. - The architecture allows for efficient, adaptive, and scalable 3D perception, outperforming strong single backbone baseline models in complex driving scenarios [12][14]. Group 3: Experimental Results - In experiments on the nuScenes dataset, CBDES MoE achieved a mean Average Precision (mAP) of 65.6 and a NuScenes Detection Score (NDS) of 69.8, surpassing all single expert baselines [37][39]. - The model demonstrated faster convergence and lower loss throughout training, indicating higher optimization stability and learning efficiency [39][40]. - The introduction of load balancing regularization significantly improved performance, with the mAP increasing from 63.4 to 65.6 when applied [42][46]. Group 4: Future Work and Limitations - Future research may explore patch-wise or region-aware routing for finer granularity in adaptability, as well as extending the method to multi-task scenarios [48]. - The current routing mechanism operates at the image level, which may limit its effectiveness in more complex environments [48].