RAD(基于3DGS大规模强化学习的端到端驾驶策略)
Search documents
地平线RAD:基于3DGS 大规模强化学习的端到端驾驶策略
自动驾驶之心· 2025-11-29 02:06
Core Insights - The article discusses a novel approach to reinforcement learning (RL) for end-to-end (e2e) policy development in autonomous driving, utilizing 3D Graphics Simulation (3DGS) to enhance training environments [1][2] - The proposed method significantly reduces collision rates, achieving a threefold decrease compared to pure imitation learning (IL) [1] - Limitations of the 3DGS environment include a lack of interaction, reliance on log replay, and inadequate rendering of non-rigid pedestrians and low-light scenarios [1] Summary by Sections Methodology - The approach consists of three main phases: training a basic Bird's Eye View (BEV) and perception model, freezing perception to train a planning head using IL, and generating a sensor-level environment with 3DGS for mixed training of RL and IL [3][5][6] - The training process involves pre-training perception models, followed by IL training on human expert data, and finally fine-tuning with RL to enhance sensitivity to critical risk scenarios [10][12] State and Action Space - The state space includes various encoders for BEV features, static map elements, traffic participant information, and planning-related features [7] - The action space is defined with discrete movements for lateral and longitudinal actions, allowing for a total of 61 actions in both dimensions [8] Reward Function - The reward function is designed to penalize collisions and deviations from expert trajectories, with specific thresholds for dynamic and static collisions, as well as positional and heading deviations [17][19] - Auxiliary tasks are introduced to stabilize training and accelerate convergence, focusing on behaviors like deceleration and acceleration [20][23] Experimental Results - The results indicate that the proposed method outperforms other IL-based algorithms, demonstrating the advantages of closed-loop training in dynamic environments [28][29] - The optimal ratio of RL to IL data is found to be 4:1, contributing to improved performance metrics [28] Conclusion - The article emphasizes the practical engineering improvements achieved through the integration of 3DGS in training environments, leading to better performance in autonomous driving applications [1][2]