李弘扬团队PlannerRFT：扩散轨迹规划新方案，提升复杂驾驶场景性能（同济&港大）

Core Viewpoint - The article discusses the development of PlannerRFT, a closed-loop and sample-efficient fine-tuning framework for diffusion model planners in autonomous driving, which significantly enhances closed-loop performance and safety in complex driving scenarios [4][48]. Group 1: Background and Motivation - Diffusion model planners have emerged as a powerful probabilistic paradigm for generating human-like driving trajectories in dynamic environments, but they face challenges such as distribution shift and goal misalignment, limiting their robustness and reliability in real-world applications [4][5]. - Reinforcement learning (RL) offers a potential solution by leveraging simulation data and simple rewards for expansion, with recent advancements in the generation-evaluation fine-tuning (RFT) paradigm balancing training efficiency and closed-loop planning performance [4][5]. Group 2: PlannerRFT Framework - PlannerRFT introduces a dual-branch optimization strategy that enhances trajectory distribution and adaptively guides the denoising process towards more promising exploration directions without altering the original inference flow [5][14]. - The framework employs a GPU-accelerated simulator, nuMax, which is ten times faster than the original nuPlan simulator, supporting large-scale parallel learning [6][24]. Group 3: Key Innovations - To achieve multi-modality, PlannerRFT incorporates an energy-based classifier guidance mechanism that injects residual offsets during the denoising process, enabling the model to generate diverse operational trajectories [8][15]. - An adaptive exploration strategy is designed to adjust the guidance scale based on scene context, enhancing the trajectory generation process to be more perception-aware [8][18]. Group 4: Performance Evaluation - Extensive evaluations on the nuPlan benchmark demonstrate that PlannerRFT achieves state-of-the-art performance, significantly improving safety and robustness in complex driving scenarios compared to baseline models [9][35]. - The framework shows notable enhancements in handling failure scenarios, such as collisions and lane departures, indicating its effectiveness in improving driving safety [9][35]. Group 5: Experimental Insights - The article highlights the importance of training data distribution, revealing that a balanced dataset combining collision and low-score scenarios yields the best results, while training solely on complex scenarios can hinder the planner's ability to handle routine driving actions [41][42]. - The survival reward mechanism is emphasized as a crucial factor in maintaining performance in challenging environments, encouraging the planner to delay failure events [43][28].