Workflow
Trajectory Prediction
icon
Search documents
ICCV'25港科大“先推理,后预测”:引入奖励驱动的意图推理,让轨迹预测告别黑箱!
自动驾驶之心· 2025-08-29 03:08
Core Insights - The article emphasizes the importance of accurately predicting the motion of road agents for the safety of autonomous driving, introducing a reward-driven intent reasoning mechanism to enhance trajectory prediction reliability and interpretability [3][5][10]. Summary by Sections Introduction - Trajectory prediction is a critical component of advanced autonomous driving systems, linking upstream perception with downstream planning modules. Current data-driven models often lack sufficient consideration of driving behavior, limiting their interpretability and reliability [5][10]. Methodology - The proposed method adopts a "reasoning first, then predict" strategy, where intent reasoning provides prior guidance for accurate and reliable multimodal motion prediction. The framework is structured as a Markov Decision Process (MDP) to model agent behavior [8][10][12]. - A reward-driven intent reasoning mechanism is introduced, utilizing Maximum Entropy Inverse Reinforcement Learning (MaxEnt IRL) to learn agent-specific reward distributions from demonstrations and relevant driving environments [8][9][10]. - A new query-centered IRL framework, QIRL, is developed to efficiently aggregate contextual features into a structured representation, enhancing the overall prediction performance [9][10][18]. Experiments and Results - The proposed method, referred to as FiM, is evaluated on large-scale public datasets such as Argoverse and nuScenes, demonstrating competitive performance against state-of-the-art models [28][30][32]. - In the Argoverse 1 dataset, FiM achieved a minimum average displacement error (minADE) of 0.8296 and a minimum final displacement error (minFDE) of 1.2048, outperforming several leading models [32][33]. - The results indicate that the intent reasoning module significantly enhances prediction confidence and reliability, confirming the effectiveness of the proposed framework in addressing complex motion prediction challenges [34][36]. Conclusion - The work redefines the trajectory prediction task from a planning perspective, highlighting the critical role of intent reasoning in motion prediction. The proposed framework establishes a promising baseline for future research in trajectory prediction [47].
VisionTrap: VLM+LLM教会模型利用视觉特征更好实现轨迹预测
自动驾驶之心· 2025-08-20 23:33
作者 | Sakura 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/716867464 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions 来源 ECCV 2024 开源数据集 在这项工作中,我们提出了一种新方法,该方法还结合了来自环视摄像头的视觉输入,使模型能够利用视觉线索,如人类的凝视和手势、道路状况、车辆转向信号 等,这些线索在现有方法中通常对模型隐藏。此外,我们使用视觉语言模型(VLM)生成并由大型语言模型(LLM)细化的文本描述作为训练期间的监督,以指 导模型从输入数据中学习特征。尽管使用了这些额外的输入,但我们的方法实现了53毫秒的延迟,使其可用于实时处理,这比之前具有类似性能的单代理预测方法 快得多。 我们的实验表明,视觉输入和文本描述都有助于提高 ...
Qcnet->SmartRefine->Donut:Argoverse v2上SOTA的进化之路~
自动驾驶之心· 2025-07-31 06:19
本文只做学术分享,如有侵权,联系删文 写在前面--先聊聊为啥写这篇文章 笔者这段时间阅读了来自ICCV2025的论文 DONUT: A Decoder-Only Model for Trajectory Prediction 作者 | Sakura 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1933901730589962575 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 这篇论文以qcnet为baseline,基于 decoder-only架构配合overprediction策略 ,在argoversev2上取得了SOTA 联想到之前笔者所阅读的论文SmartRefine,该论文也是基于Qcnet的基础上对refine部分进行改进,也在argoverse v2上取得了SOTA; 因此,本着学习的态度,笔者想 在此简单总结这三篇论文 ; Query-Centric Trajectory Prediction--CVPR 2023 SmartRefin ...
自动驾驶之心技术交流群来啦!
自动驾驶之心· 2025-07-29 07:53
Core Viewpoint - The article emphasizes the establishment of a leading communication platform for autonomous driving technology in China, focusing on industry, academic, and career development aspects [1]. Group 1 - The platform, named "Autonomous Driving Heart," aims to facilitate discussions and exchanges among professionals in various fields related to autonomous driving technology [1]. - The technical discussion group covers a wide range of topics including large models, end-to-end systems, VLA, BEV perception, multi-modal perception, occupancy, online mapping, 3DGS, multi-sensor fusion, transformers, point cloud processing, SLAM, depth estimation, trajectory prediction, high-precision maps, NeRF, planning control, model deployment, autonomous driving simulation testing, product management, hardware configuration, and AI job exchange [1]. - Interested individuals are encouraged to join the community by adding a WeChat assistant and providing their company/school, nickname, and research direction [1].