Workflow
自动驾驶之心
icon
Search documents
端到端系列!SpareDrive:基于稀疏场景表示的端到端自动驾驶~
自动驾驶之心· 2025-06-23 11:34
Core Viewpoint - The article discusses the limitations of existing end-to-end methods in autonomous driving, particularly the computational intensity of BEV paradigms and the inefficiency of sequential prediction and planning approaches. It proposes a new Sparse paradigm that allows for parallel processing of prediction and planning tasks [2][5]. Group 1: SparseDrive Methodology - SparseDrive adopts the core ideas from the previous Horizon Sparse series, focusing on sparse scene representation for autonomous driving [3]. - The proposed method modifies the similarities between motion prediction and planning, introducing a hierarchical planning selection strategy [5]. - The architecture includes features such as symmetric sparse perception and a parallel motion planner [5]. Group 2: Training and Performance - The training loss function for SparseDrive is defined as a combination of detection, mapping, motion, planning, and depth losses [9]. - Performance comparisons show that SparseDrive-S achieves a mean Average Precision (mAP) of 0.418, while SparseDrive-B reaches 0.496, outperforming other methods like UniAD [11]. - In motion prediction and planning, SparseDrive-S and SparseDrive-B demonstrate significant improvements in metrics such as minADE and minFDE compared to traditional methods [18]. Group 3: Efficiency Comparison - SparseDrive exhibits superior training and inference efficiency, requiring only 15.2 GB of GPU memory and achieving 9.0 FPS during inference, compared to UniAD's 50.0 GB and 1.8 FPS [20]. - The method's reduced computational requirements make it more accessible for real-time applications in autonomous driving [20]. Group 4: Course and Learning Opportunities - The article promotes a course focused on end-to-end autonomous driving algorithms, covering foundational knowledge, practical implementations, and various algorithmic approaches [29][41]. - The course aims to equip participants with the skills necessary to understand and implement end-to-end solutions in the autonomous driving industry [54][56].
上交&卡尔动力FastDrive!结构化标签实现端到端大模型更快更强~
自动驾驶之心· 2025-06-23 11:34
Core Viewpoint - The integration of human-like reasoning capabilities into end-to-end autonomous driving systems is a cutting-edge research area, with a focus on vision-language models (VLMs) [1]. Group 1: Structured Dataset and Model - A structured dataset called NuScenes-S has been introduced, which focuses on key elements closely related to driving decisions, eliminating redundant information and improving reasoning efficiency [4][5]. - The FastDrive model, with 0.9 billion parameters, mimics human reasoning strategies and effectively aligns with end-to-end autonomous driving frameworks [4][5]. Group 2: Dataset Description - The NuScenes-S dataset provides a comprehensive view of driving scenarios, addressing issues often overlooked in existing datasets. It includes key elements such as weather, traffic conditions, driving areas, traffic lights, traffic signs, road conditions, lane markings, and time [7][8]. - The dataset construction involved annotating scene information using both GPT and human input, refining the results through comparison and optimization [9]. Group 3: FastDrive Algorithm Model - The FastDrive model follows the "ViT-Adapter-LLM" architecture, utilizing a Vision Transformer for visual feature extraction and a token-packing module to enhance inference speed [18][19]. - The model employs a large language model (LLM) to generate scene descriptions, identify key objects, predict future states, and make driving decisions in a reasoning chain manner [19]. Group 4: Experimental Results - Experiments conducted on the NuScenes-S dataset, which contains 102,000 question-answer pairs, demonstrated that FastDrive achieved competitive performance in scene understanding tasks [21]. - The performance metrics for FastDrive showed strong results in perception, prediction, and decision-making tasks, outperforming other models [25].
ADAS新范式!北理&清华MMTL-UniAD:多模态和多任务学习统一SOTA框架(CVPR'25)
自动驾驶之心· 2025-06-23 11:34
Core Insights - The article presents MMTL-UniAD, a unified framework for multimodal and multi-task learning in assistive driving perception, which aims to enhance the performance of advanced driver-assistance systems (ADAS) by simultaneously recognizing driver behavior, emotions, traffic environment, and vehicle actions [1][5][26]. Group 1: Introduction and Background - Advanced driver-assistance systems (ADAS) have significantly improved driving safety over the past decade, yet approximately 1.35 million people die in traffic accidents annually, with over 65% of these incidents linked to abnormal driver psychological or physiological states [3]. - Current research often focuses on single tasks, such as driver behavior or emotion recognition, neglecting the inherent connections between these tasks, which limits the potential for cross-task learning [4][3]. Group 2: Framework and Methodology - MMTL-UniAD employs a multimodal approach to achieve synchronized recognition of driver behavior, emotions, traffic environment, and vehicle actions, addressing the challenge of negative transfer in multi-task learning [5][26]. - The framework incorporates two core components: a multi-axis region attention network (MARNet) and a dual-branch multimodal embedding module, which effectively extract task-shared and task-specific features [5][26]. Group 3: Experimental Results - MMTL-UniAD outperforms existing state-of-the-art methods across multiple tasks, achieving performance improvements of 4.10% to 12.09% in the mAcc metric on the AIDE dataset [18][26]. - The framework demonstrates superior accuracy in driver behavior recognition and vehicle behavior recognition, with increases of 4.64% and 3.62%, respectively [18][26]. Group 4: Ablation Studies - Ablation experiments indicate that joint training of driver state tasks and traffic environment tasks enhances feature sharing, significantly improving task recognition accuracy [22][26]. - The results confirm that the interdependence of tasks in MMTL-UniAD contributes to overall performance and generalization capabilities [22][26].
热乎出炉的面经,刚面完NVIDIA TRT LLM~
自动驾驶之心· 2025-06-23 11:34
作者 | 笑渐不闻声渐悄 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1918033580103282744 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>点击进入→ 自动驾驶之心 『求职招聘』技术交流群 本文只做学术分享,如有侵权,联系删文 热乎出炉,刚面完Nvidia TRTLLM。本人bg是做llm推理加速的,主要在做speculative decoding,也 有一篇文章中了ICLR 2025。因为想继续做推理加速,所以尝试性的面了一下Nvidia,看能不能积累 connection。首先得吐槽一下这个面试机制:4位面试官一人面了我一个小时,整整连续面了4个小 时,面完感觉就是一个虚弱无力...然后简单聊一聊面试的问题 第一位面试官:自我介绍,讲一下自己的iclr 25关于spec的工作。面试官问的比较细致,从方法的 设置到evaluation都问到了,然后简单讲了一下自己nips 23的科研工作。感觉面试官对我的科研经 历还是比较满意,随后出了一道coding:n位数字插入任意数量的+,最后 ...
为什么一篇论文要耗尽整个研究生生涯?
自动驾驶之心· 2025-06-23 08:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 收到了许多同学在论文发表上的求助,学校绕不开一篇三区论文硕士毕业,没有三篇CCF-A博士都毕不了 业,老师对这个新的方向不熟悉,开展不了工作。一直在为论文选题绞尽脑汁,实验设计总遇瓶颈,写作 逻辑混乱不清,投稿屡屡被拒! 尤其是在前沿且复杂的自动驾驶、具身智能、机器人领域,真的有点力不 从心! 一篇论文往往需要1-2年的时间筹备发出,对硕士来说,基本上贯穿了整个学术生涯。方法错误、走弯路、 无人指点是最消耗时间的!论文发表难,但也不是没有办法,有大佬带队,一年发几篇都很正常。筹备了 好久,我们服务大家的论文辅导正式推出了,面向自动驾驶/具身智能/机器人领域。 我们是谁? 国内最大的AI类技术自媒体平台,IP包含自动驾驶之心/具身智能之心/3D视觉之心等平台,拥有国内最顶 尖的学术资源。深耕 自动驾驶、具身智能、机器人 方向多年。我们深刻理解这些交叉学科的挑战与机遇, 更明白一篇高质量论文对于学生(尤其是硕博生)学业和未来发展的重要性。 我们300+专职于自动驾驶/具身智能方向的老师。来自于全球QS排名前100 ...
深入浅出完整解析LoRA(Low-Rank Adaptation)模型核心基础知识
自动驾驶之心· 2025-06-22 14:09
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 大模型高效微调已经成为业界关注的焦点,无论是通用大模型,还是智驾大模型,如何通过轻量微调变成各个不同领域的专业模型,成为 讨论的热点。所以今天就来大家一起聊聊LORA。 背景: 业内的大公司或者研究机构,都是有足够资源的来开发大模型,但是对于一般的小公司或者个人来说,要想开发自己的大模型几乎 不可能,要知道像 ChatGPT 这样的大模型,一次训练的成本就在上千万美元,而即使是DeepSeekv3,单次训练成本也在500万美元以上, 所以充分利用开源大模型,在领域任务上高效微调便成为了当下学术界和工业界迫切需要解决的问题,至此LoRA问世: LoRA 的思想很简单: 而这个降维的操作就需要用到低秩分解了,接下来我们回顾下低秩分解: * [16] A. A. K. 那么LoRA训练的思路和优势是什么呢? 在原始 PLM (Pre-trained Language Model) 旁边增加一个旁路,做一个降维再升维的操作,来模拟所谓的intrinsic rank。 训练的时候固定 PLM 的参数,只训练降维矩阵 A ...
大模型强化学习,相比PPO,DPO 还是个弟弟?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the theoretical and experimental shortcomings of DPO (Direct Preference Optimization) compared to PPO (Proximal Policy Optimization), highlighting that while DPO appears to lead in open-source benchmarks, top closed-source models like GPT-4 and Claude utilize PPO [1][2]. DPO's Deficiencies - DPO encounters issues similar to reward hacking, where it can produce solutions that do not align with human preferences, despite lacking an explicit reward model [2]. - The theoretical framework suggests that the strategies derived from PPO are a true subset of those from DPO when given true reward signals, indicating that DPO may generate solutions that deviate from reference strategies [3]. Experimental Findings - Experiments reveal that DPO can assign higher probabilities to data points not covered in the preference dataset, leading to unexpected behaviors, while PPO optimizes effectively under KL constraints [6]. - The performance of DPO can be improved by reducing distribution drift through methods like SafeSFT, but it still does not surpass PPO [8]. Performance Metrics - Benchmark results consistently show that PPO outperforms both iterative DPO and DPO in various tasks, particularly in programming competitions [10]. - Specific metrics indicate that models using PPO achieve significantly higher pass rates compared to those using DPO, with PPO models reaching up to 44.4% in pass@5 metrics, while DPO models struggle to achieve meaningful results [11][12]. Conclusion - The findings suggest that while DPO has theoretical merits, its practical application in high-stakes tasks like programming is limited compared to PPO, which continues to set new standards in performance [13].
实验室老板想搞个自动驾驶小车,还没什么头绪。。。
自动驾驶之心· 2025-06-22 14:09
自动驾驶之心团队推出的教研一体轻量级解决方案,支持感知、定位、融合、导航、规划等多个功能平台,阿克曼底盘。 重磅!预售来啦。面向科研&教学级自动驾驶全栈小车黑武士系列001正式开售了。世界太枯燥了,和我们一起做点有意思的事情吧。 原价36999元,现 在下单赠送3门课程( 模型部署+点云3D检测+多传感器融合 ),优先锁定的安排组装发货。 这两个月订单排满了,正在不断组装调试,5台及以上订单可以优惠哦!欢迎高校和研究院所批量采购。感兴趣的同学可以早点下单哦~ 1)黑武士001 黑武士支持二次开发和改装,预留了众多安装位置和接口,可以加装相机、毫米波雷达等传感器; 2)效果展示 我们测试了室内、室外、地库等场景下感知、定位、融合、导航规划等功能; 整体功能介绍 本科生学习进阶+比赛;√ 研究生科研+发论文;√ 研究生找工作+项目;√ 高校实验室教具;√ 培训公司/职业院校教具;√ 户外公园行驶 点云3D目标检测 室内地库2D激光建图 室内地库3D激光建图 上下坡测试 室外大场景3D建图 室外夜间行驶 6)软件说明 软件与语言框架:ROS、C++、python 支持一键启动,提供开发环境 3)硬件说明 | 主要传 ...
自动驾驶端到端VLA落地,算法如何设计?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the rapid advancements in end-to-end autonomous driving, particularly focusing on Vision-Language-Action (VLA) models and their applications in the industry [2][3]. Group 1: VLA Model Developments - The introduction of AutoVLA, a new VLA model that integrates reasoning and action generation for end-to-end autonomous driving, shows promising results in semantic reasoning and trajectory planning [3][4]. - ReCogDrive, another VLA model, addresses performance issues in rare and long-tail scenarios by utilizing a three-stage training framework that combines visual language models with diffusion planners [7][9]. - Impromptu VLA introduces a dataset aimed at improving VLA models' performance in unstructured extreme conditions, demonstrating significant performance improvements in established benchmarks [14][24]. Group 2: Experimental Results - AutoVLA achieved competitive performance metrics in various scenarios, with the best-of-N method reaching a PDMS score of 92.12, indicating its effectiveness in planning and execution [5]. - ReCogDrive set a new state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, showcasing its robustness and safety in driving trajectories [9][10]. - The OpenDriveVLA model demonstrated superior results in open-loop trajectory planning and driving-related question-answering tasks, outperforming previous methods on the nuScenes dataset [28][32]. Group 3: Industry Trends - The article highlights a trend among major automotive manufacturers, such as Li Auto, Xiaomi, and XPeng, to invest heavily in VLA model research and development, indicating a competitive landscape in autonomous driving technology [2][3]. - The integration of large language models (LLMs) with VLA frameworks is becoming a focal point for enhancing decision-making capabilities in autonomous vehicles, as seen in models like ORION and VLM-RL [33][39].
100+自动驾驶数据集,这5个你总得知道吧?
自动驾驶之心· 2025-06-22 01:35
Core Viewpoint - The article emphasizes the growing importance of autonomous driving technology and highlights the availability of over 100 high-quality datasets for developers and researchers in the field. It introduces five key datasets that cover various tasks from perception to visual odometry, providing valuable resources for both beginners and experienced engineers [2]. Dataset Summaries 1. KITTI Dataset - The KITTI dataset is one of the most classic and widely used benchmark datasets in the autonomous driving field. It was collected in Karlsruhe, Germany, using high-precision sensors such as stereo color/gray cameras, Velodyne 3D LiDAR, and GPS/IMU. The dataset includes annotations for various perception tasks, including stereo vision, optical flow, visual odometry, and 3D object detection and tracking, making it a standard for evaluating vehicle vision algorithms [3]. 2. nuScenes Dataset - nuScenes is a large-scale multi-sensor dataset released by Motional, covering 1,000 continuous driving scenes in Boston and Singapore, totaling approximately 15 hours of data. It includes a full suite of sensors: six cameras, five millimeter-wave radars, one top-mounted LiDAR, and IMU/GPS. The dataset provides around 1.4 million high-resolution camera images and 390,000 LiDAR scans, annotated with 3D bounding boxes for 23 object categories, making it suitable for research on complex urban road scenarios [5][7]. 3. Waymo Open Dataset - The Waymo Open Dataset, released by Google Waymo, is one of the largest open data resources for autonomous driving. It consists of two main parts: a perception dataset with 2,030 scenes of high-resolution camera and LiDAR data, and a motion dataset with 103,354 vehicle trajectories and corresponding 3D map information. This extensive multi-sensor dataset covers various times, weather conditions, and urban environments, serving as a benchmark for target detection, tracking, and trajectory prediction research [10][12]. 4. PathTrack Dataset - PathTrack is a dataset focused on person tracking, containing over 15,000 trajectories across 720 sequences. It utilizes a re-trained existing person matching network, significantly reducing the classification error rate. The dataset is suitable for 2D/3D object detection, tracking, and trajectory prediction tasks [13][14][15]. 5. ApolloScape Dataset - ApolloScape, released by Baidu Apollo, is a massive autonomous driving dataset characterized by its large volume and high annotation accuracy. It reportedly exceeds similar datasets in size by over ten times, containing hundreds of thousands of high-resolution images with pixel-level semantic segmentation annotations. ApolloScape defines 26 different semantic categories and includes complex road scenarios, making it applicable for perception, map construction, and simulation training [17][19].