自动驾驶轨迹规划
Search documents
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
作者 | 小小螺丝钉 编辑 | 自动驾驶之心 举个例子,怎样才能在考试中取得高分?一个比较有效的方法是刷考试真题,多刷几套总结经验。因此,如果能将开环训练换成闭环训练,模拟实车测试的环 境,我相信是更加有效的训练方式。但是 RL 这种闭环训练方法非常依赖仿真环境是否真实,尤其是自动驾驶这样强交互的任务,仿真环境的真实性更加重要,这 也是很多大厂在朝 world model 上发力的原因之一。那如果我们没有一个高保真的仿真环境,那怎么用 RL 呢,这篇文章给我们提供了一个比较好的思路。 本文的网络结构是沿用了 waymo 之前发的一篇论文 MotionLM,如下图所示,是用自回归的方式进行轨迹输出。简单介绍下自回归,在推理阶段每次输出 ego 和 agent 的一个 action,通过 for 循环,输出完整的轨迹。这样做可以确保因果关系一致。由于网络同时会输出 ego 和 agent 的 action,这样就天然构成了一个 simulation,从某种程度上说,这就是一个简易版的 world model。 原文链接: https://zhuanlan.zhihu.com/p/19813730555079079 ...
干货 | 基于深度强化学习的轨迹规划(附代码解读)
自动驾驶之心· 2025-07-29 23:32
Core Viewpoint - The article discusses the advancements and applications of reinforcement learning (RL) in the field of autonomous driving, highlighting its potential to enhance decision-making processes in dynamic environments. Group 1: Background and Concepts - The concept of VLA (Variational Learning Algorithm) and its relation to embodied intelligence is introduced, emphasizing its similarity to end-to-end autonomous driving [3] - Reinforcement learning has gained traction in various industries following significant milestones like AlphaZero in 2018 and ChatGPT in 2023, showcasing its broader applicability [3] - The article aims to explain reinforcement learning from a computer vision perspective, drawing parallels with established concepts in the field [3] Group 2: Learning Methods - Supervised learning in autonomous driving involves tasks like object detection, where a model is trained to map inputs to outputs using labeled data [5] - Imitation learning is described as a method where models learn actions by mimicking human behavior, akin to how children learn from adults [6] - Reinforcement learning differs from imitation learning by focusing on optimizing actions based on feedback from interactions with the environment, making it suitable for sequential decision-making tasks [7] Group 3: Advanced Learning Techniques - Inverse reinforcement learning is introduced as a method to derive reward functions from expert data, particularly useful when defining rewards is challenging [8] - The Markov Decision Process (MDP) is explained as a framework for modeling decision-making tasks, where states, actions, and rewards are interrelated [9] - Dynamic programming and Monte Carlo methods are discussed as techniques for solving reinforcement learning problems, emphasizing their role in optimizing decision-making processes [11][12] Group 4: Reinforcement Learning Algorithms - Various reinforcement learning algorithms are categorized, including on-policy and off-policy methods, highlighting their differences in training stability and data utilization [25][26] - The article outlines key algorithms such as Q-learning, SARSA, and policy gradient methods, explaining their mechanisms and applications in reinforcement learning [27][29] - Advanced algorithms like TRPO and PPO are presented, focusing on their strategies for ensuring stable training and optimizing policy updates [57][58] Group 5: Applications in Autonomous Driving - The importance of reward design in autonomous driving is emphasized, with safety, comfort, and efficiency being key factors [62] - The article discusses the need for closed-loop training systems in autonomous driving, where vehicle actions influence the environment, necessitating dynamic modeling of other vehicles [62] - The integration of end-to-end learning with reinforcement learning is highlighted as a method to adapt to changing environments in real-time [63]