一文尽览！扩散模型在自动驾驶基础模型中的应用汇总，30+工作都在这里了~

Core Insights - The article discusses the significant role of diffusion models in the development of autonomous driving technologies, highlighting their ability to enhance data diversity, improve perception system robustness, and assist decision-making under uncertainty [2][3]. Group 1: Diffusion Models in Autonomous Driving - Diffusion models have shown promising applications in autonomous driving, particularly in generating diverse and physically constrained results from complex data distributions [2]. - The introduction of the Dual-Conditioned Temporal Diffusion Model (DcTDM) allows for the generation of realistic long-duration driving videos, addressing challenges such as limited data quality and high costs [3][4]. - The performance of DcTDM has been evaluated, demonstrating over 25% improvement in consistency and frame quality compared to other video diffusion models [3]. Group 2: Applications in Perception and Decision-Making - In perception, diffusion models significantly outperform traditional methods in 3D occupancy prediction, especially in occluded or low-visibility areas, thereby supporting downstream planning tasks [4]. - The Stable Diffusion Model effectively predicts vehicle trajectories, enhancing the predictive capabilities of autonomous driving systems [4]. - The DiffusionDrive framework utilizes diffusion models to model multimodal action distributions, innovating end-to-end autonomous driving applications by addressing uncertainties in driving decisions [4]. Group 3: Data Generation and Quality Improvement - Diffusion models are crucial for generating high-quality synthetic data, addressing the challenges of insufficient diversity and authenticity in natural driving datasets [4]. - The introduction of controllable generation techniques is particularly important for overcoming 3D data annotation challenges, with future explorations into video generation aimed at further enhancing data quality [4]. Group 4: Advanced Frameworks and Innovations - LD-Scene combines large language models with latent diffusion models to generate adversarial driving scenarios, enhancing the controllability and robustness of generated scenes [9]. - DualDiff introduces a dual-branch diffusion model designed to improve multi-view driving scene generation, utilizing occupancy ray sampling for rich semantic information [30]. - DiVE employs a diffusion transformer framework to generate high-fidelity, temporally coherent multi-view videos, achieving state-of-the-art performance in multi-view video generation [19][20]. Group 5: Safety and Critical Scenario Generation - AVD2 enhances understanding of accident scenarios by generating videos aligned with detailed natural language descriptions, contributing to accident analysis and prevention [36]. - AdvDiffuser generates adversarial safety-critical driving scenarios, improving transferability across different systems while maintaining authenticity and diversity [68][69]. - The introduction of Causal Composition Diffusion Model (CCDiff) enhances controllability and realism in generating closed-loop traffic scenarios, significantly outperforming existing methods [41].