英伟达又一新作！MPA：基于模型的闭环端到端自适应策略新框架（CMU&斯坦福等）

Core Insights - The article discusses the Model-Based Policy Adaptation (MPA) framework aimed at enhancing the robustness and safety of end-to-end (E2E) autonomous driving agents during closed-loop evaluations [2][6][41] - MPA addresses the challenges of cascading errors and insufficient generalization capabilities in closed-loop evaluations by utilizing a model-based approach to adapt pre-trained E2E driving agents [2][6] Summary by Sections Background - E2E autonomous driving models have shown significant progress by integrating perception, prediction, and planning into a unified learning framework, but they face performance degradation in closed-loop environments due to cumulative errors and distribution shifts [3][6] - The gap between offline training and online objectives highlights the need for improved closed-loop performance evaluation [5][9] MPA Framework - MPA is designed to bridge the performance gap by generating counterfactual data using a high-fidelity 3D Gaussian splatter (3DGS) simulation engine, which allows the agent to experience diverse scenarios beyond the original dataset [7][14] - The framework includes a diffusion model-based policy adapter and a multi-step Q-value model to optimize the agent's predictions and evaluate long-term rewards [7][21] Experimental Results - MPA was validated on the nuScenes benchmark dataset, demonstrating significant performance improvements in both in-domain and out-of-domain scenarios, particularly in safety-critical situations [11][33] - The results indicate that MPA outperforms baseline models, achieving higher scores in key metrics such as route completion (RC) and HDScore [33][36] Contributions - The article outlines three main contributions: 1. Analysis of the root causes of performance decline in closed-loop evaluations and the fidelity of 3DGS simulations [11][41] 2. Development of a systematic counterfactual data generation process and training of the MPA framework [11][43] 3. Demonstration of MPA's effectiveness in enhancing the performance of E2E driving agents in various scenarios [41][43] Limitations and Future Work - The MPA framework relies on the assumption that 3DGS can provide reliable rendering under constrained trajectory deviations, which may not hold in all cases [44] - Future work will focus on expanding the dataset, integrating online reinforcement learning, and enhancing the framework's robustness in diverse driving conditions [44][46]