自动驾驶之心
Search documents
端到端VLA剩下的论文窗口期没多久了......
自动驾驶之心· 2026-01-12 09:20
Core Viewpoint - The article emphasizes the importance of deep learning and emerging technologies in the fields of automation and computer science, suggesting that students should focus on these areas to remain competitive in the job market [2]. Group 1: Recommended Learning Paths - For students in automation and computer science, deep learning, VLA, end-to-end systems, and world models are highlighted as promising areas with significant potential for research and career development [2]. - Mechanical and vehicle engineering students are advised to start with traditional PnC and 3DGS, which are easier to grasp and require lower computational power [2]. Group 2: Research Guidance Services - The article announces the launch of a paper guidance service that covers various advanced topics such as end-to-end systems, VLA, world models, reinforcement learning, and more [3]. - The service includes support for paper topic selection, full process guidance, experimental guidance, and doctoral application assistance [6][9]. Group 3: High Acceptance Rates - The guidance service boasts a high acceptance rate for papers, with several already published in top conferences and journals such as CVPR, AAAI, and ICLR [7]. - Different pricing structures are available based on the level of the paper, indicating a tailored approach to support [7].
最近会开放一批端到端&VLA的岗位需求
自动驾驶之心· 2026-01-12 03:15
Core Insights - The consensus among industry experts indicates that 2026 will be a pivotal year for the development of end-to-end (E2E) and VLA (Vision-Language Alignment) technologies in autonomous driving, with a focus on optimizing production processes rather than making significant algorithmic changes [1] - The industry is actively recruiting experienced algorithm engineers and developing talent to tackle the complex challenges ahead, particularly in areas such as BEV perception, large models, diffusion models, and reinforcement learning [1] Course Overview - The course on E2E and VLA autonomous driving is designed to provide a comprehensive learning path from principles to practical applications, developed in collaboration with industry leaders [3] - The course covers various aspects of E2E algorithms, including their historical development, advantages and disadvantages of different paradigms, and current trends in both academia and industry [6][7] - Key technical keywords that are expected to be frequently encountered in job interviews over the next two years are emphasized in the course content [7] Course Structure - Chapter 1 introduces the concept of E2E algorithms, discussing their evolution from modular approaches to current paradigms like VLA [6] - Chapter 2 focuses on the background knowledge necessary for understanding E2E technologies, including VLA, large language models, diffusion models, and reinforcement learning [11] - Chapter 3 delves into two-stage E2E algorithms, exploring their emergence and comparing them with one-stage approaches [7] - Chapter 4 presents one-stage E2E algorithms and VLA, highlighting various subfields and their contributions to achieving the ultimate goals of E2E systems [8] - Chapter 5 involves a practical assignment on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, demonstrating how to build and experiment with pre-training and reinforcement learning modules [9] Learning Outcomes - The course aims to elevate participants to the level of an E2E autonomous driving algorithm engineer within approximately one year, covering a wide range of methodologies including one-stage, two-stage, world models, and diffusion models [15] - Participants will gain a deeper understanding of key technologies such as BEV perception, multimodal large models, reinforcement learning, and diffusion models, enabling them to apply their knowledge in real-world projects [15]
华为ADS智驾方案分析
自动驾驶之心· 2026-01-10 03:47
作者 | 高毅鹏@知乎 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1981658979764568316 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 华为 华为 ADS 硬件迭代:多传感器融合方案 从 ADS1.0 到 4.0 的迭代过程中,华为坚持多传感方案,通过激光雷达、毫米波雷达、摄像头等多传感器互补融合感知,达到全天候全天时的高效感知能力。 | | 激光雷达 | 毫米波雷达 | 超声波雷达 | 摄像头 | | --- | --- | --- | --- | --- | | 不息留 | | | | | | 作用 | 成像级传感器,可以渲染3D环境 | 盲点监测、变道辅助 | 5m以内的短距感知,泊车辅助 | 环境探测、障碍物信息采集 | | 优点 | 成像干净、噪点少,信息丰富 | 条件下仍可完成深刻 | 感知距离远,在夜晚以及极端天气的 获取信息较为丰富、速度快、抗电 成本相对较低;可以描绘道路环境的 磁 ...
刷完了端到端和VLA新工作,这9个开源项目最值得复现......
自动驾驶之心· 2026-01-10 03:47
Core Viewpoint - The article highlights the rapid growth of open-source projects in the field of autonomous driving, particularly those expected to be valuable in 2025. It emphasizes the importance of these projects in providing comprehensive solutions for end-to-end autonomous driving, from data cleaning to evaluation, and encourages developers to engage with these resources for practical learning and application [4][5]. Summary by Relevant Sections DiffusionDrive - Developed by Huazhong University of Science and Technology and Horizon, DiffusionDrive addresses the conflict between diversity generation and real-time inference in end-to-end autonomous driving planning. It introduces a solution that simplifies traditional multi-step denoising to just 2-4 steps while maintaining action distribution diversity and achieving real-time performance of 45 FPS on a 4090 GPU. The model has demonstrated high planning quality with a PDMS score of 88.1 on the NAVSIM benchmark [8]. OpenEMMA - OpenEMMA, created by Texas A&M University, University of Michigan, and University of Toronto, proposes a lightweight and generalizable framework to tackle the high training costs and deployment difficulties of multimodal large language models (MLLM) in autonomous driving. It employs a Chain-of-Thought reasoning mechanism to enhance the model's generalization and reliability in complex scenarios without the need for extensive retraining [11]. Diffusion-Planner - This project, involving Tsinghua University and several other institutions, presents a Transformer-based diffusion planning model that generates multimodal trajectories from noise, addressing the average solution dilemma in imitation learning. It integrates trajectory prediction and vehicle planning into a unified architecture, achieving leading performance on the nuPlan benchmark [14]. UniScene - UniScene, developed by Shanghai Jiao Tong University and others, introduces a multimodal generation framework to reduce the high costs of obtaining high-quality data for autonomous driving. It employs a layered generation approach to create occupancy maps and corresponding multimodal data, significantly improving the quality of generated data for downstream tasks [16]. ORION - ORION, from Huazhong University of Science and Technology and Xiaomi, tackles the disconnection between causal reasoning and trajectory generation in end-to-end autonomous driving. It utilizes a unified framework to align visual, reasoning, and action spaces, leading to improved driving scores and success rates in evaluations [18]. FSDrive - FSDrive, developed by Xi'an Jiaotong University and others, addresses the issue of visual detail loss in end-to-end driving planning caused by reliance on pure textual reasoning. It proposes a visual reasoning paradigm that enhances trajectory accuracy and safety while maintaining strong scene understanding capabilities [21]. AutoVLA - AutoVLA, from UCLA, presents a unified autoregressive generative framework that ensures the physical feasibility of actions in driving models. It allows for adaptive reasoning based on scene complexity and has shown competitive performance across various benchmarks [24]. OpenDriveVLA - OpenDriveVLA, created by Technical University of Munich and others, is an end-to-end driving VLA model that integrates multimodal inputs to output driving actions. It effectively bridges the semantic gap between visual and language modalities, demonstrating its effectiveness in open-loop planning and driving Q&A tasks [26]. SimLingo - SimLingo addresses the common disconnect between language models and driving behavior in autonomous driving. It proposes a multi-task joint training framework that aligns driving behavior, visual language understanding, and language-action consistency, achieving leading performance in evaluations [29]. Conclusion - The article encourages developers to utilize these repositories as engineering building blocks, suggesting that practical engagement with the code and demos can significantly enhance understanding of autonomous driving technology [31].
搞过自驾的小伙伴,在其他领域还是很抢手
自动驾驶之心· 2026-01-10 03:47
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 如果你也想和我们一起推动自驾领域的进步,欢迎加入我们的社区团队,和我们一起推动! 我们准备了大额 优惠券给大家,欢迎微信扫码领取,还有少量~ 自驾行业25年还是很精彩的,在整体下沉的关键节点,都很卷。 卷技术、卷成本、卷效率。我们平台亦是如 此,扩充了很多B端的客户,也开始尝试从线上走向线下。C端也慢慢从普适性的能容逐渐专业化和精细化。 上半年不少自驾的同学转行去了具身,包括现在也是如此,L4/具身/无人机几个行业在大批量招人,而自驾又 是相对成熟的AI领域,所以自驾的算法人才非常受欢迎,几个头部企业的薪资很到位(大疆/宇树/智元/哈啰等 等)。 搞过自驾的人,用过大集群,解过各种corner case,上下游协同能力强,这些都是其他几个行业所欠缺的。 自驾的头前沿技术收敛到几个大方向上:一段式端到端、VLA、世界模型(重建+仿真)、强化学习。我们接 触到的中游厂商还在攻坚OCC、无图、多传感器融合感知等等,明年这些公司都有大量hc开放。 今年,自动驾驶之心的付费社区的成员正式突破4000人了。 如果想看各家 ...
今天十点!一场关于自驾L4的圆桌探讨(斯年智驾/新石器/卡尔动力等)
自动驾驶之心· 2026-01-10 01:00
Core Insights - The article discusses the advancements in autonomous driving technology, particularly the transition from Level 2 (L2) to Level 4 (L4) automation, highlighting that high-level assisted driving has reached a "quasi-L4" stage by December 2025 [3] - It emphasizes the significant investment in the L4 sector, with over 30 billion yuan raised in the domestic autonomous driving industry in 2025, indicating a shift in focus towards L4 technology [3] - A roundtable event is planned to explore the technological and commercial realities of L4 autonomous driving, featuring insights from leading companies in the field [3] Group 1: Industry Developments - The technology pathways for L2 and L4 are converging, allowing for the reuse of the same model across both levels [3] - The article notes that the L4 sector is gaining renewed attention due to its potential to reach a critical development phase [3] Group 2: Event Details - A significant roundtable discussion on L4 autonomous driving will take place, featuring diverse perspectives from top companies in the industry [3] - The event aims to delve into the evolution of L4 technology, market dynamics, and future directions [3] Group 3: Key Speakers - He Bei, founder and chairman of Sinian Intelligent Driving, has extensive experience in autonomous driving and has published over 30 papers and holds more than 100 patents [4] - Miao Qiankun, CTO of New Stone Age Autonomous Vehicles, has over 15 years of experience in R&D and has led the development of L4 urban logistics delivery vehicles, which are operational in over 300 cities [5] - Wang Ke, VP of AI R&D at Karl Power, and Ma Qianli, Tech Lead at a top global automotive company, bring significant expertise in autonomous vehicle technology and commercial operations [6]
实车验证AlignDrive:端到端的横纵向对齐规划(西交&地平线)
自动驾驶之心· 2026-01-09 06:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 为了解决这一问题,AlignDrive 提出了一种级联框架, 使纵向规划依赖于横向路径 ,从而实现横纵规划的紧密协同。具体而言, 模型先预测横向路径(drive path) ,然后基于动态环境信息预测沿 该路径的逐时刻1D纵向位移 。可以直观地理解为:一个模块负责"转方向盘",另一个模块负责"踩油门和刹车"。这种设计让不同模块 专注于各自关键信息,尤其是纵向位移的预测,能够建立动态物体与自车行为之间更紧密的关联,使模型更充分地关注动态交互对象,从而提升对动态场景的交互建 模能力。 ★ 视频展示: 一、概述 近年来,端到端自主驾驶技术取得了显著进展,实现了感知和规划的联合处理。在规划阶段,现有的端到端模型通常将规划分解为并行的横向和纵向预测。虽然这种 方法有效,但存在两个主要问题:一是横向路径和速度之间的协调会变得更困难;二是静态信息的冗余编码,导致纵向规划未能充分利用行驶路径作为先验信息。这 些问题限制了模型在复杂场景中的表现。 论文作者 | Yanhao ...
自动驾驶L4的冰与火:L2到L4是否成为可落地的工程现实......
自动驾驶之心· 2026-01-09 06:32
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly the transition from Level 2 (L2) to Level 4 (L4), highlighting the significant investments and developments in the L4 sector within the industry [3]. Group 1: Industry Developments - By December 2025, the autonomous driving industry in China is expected to have raised over 30 billion in funding, with a focus on L4 technology [3]. - The article emphasizes that the technological pathways for L2 and L4 are converging, allowing for the reuse of the same model across both levels [3]. - A roundtable discussion on L4 autonomous driving will be held, featuring leading companies in the field to explore the balance between technological ideals and commercial realities [3]. Group 2: Key Speakers - He Bei, founder and chairman of Sinian Intelligent Driving, has a PhD from Tsinghua University and extensive experience in autonomous driving technology [4]. - Miao Qiankun, CTO of New Stone Age Autonomous Vehicles, has over 15 years of experience in R&D and has led the development of L4 urban logistics delivery vehicles, which have been deployed in over 300 cities and 10 countries, with a total of 15,000 vehicles delivered and over 60 million kilometers driven [5]. - Wang Ke, Vice President of AI R&D at Karl Power, previously led the perception tracking module at Zoox, a US autonomous driving unicorn [6]. Group 3: Event Details - The upcoming roundtable will delve into the evolution of L4 technology, market dynamics, and future development directions, promising a blend of depth and foresight [3]. - The event will feature a diverse range of perspectives from top companies in the L4 sector, indicating a significant interest in the current state and future of autonomous driving technology [3].
当我们把3DGS在工业界的应用展开后......
自动驾驶之心· 2026-01-09 06:32
Core Viewpoint - The article discusses the advancements and applications of 3D Generative Systems (3DGS) in the context of autonomous driving, emphasizing the importance of scene reconstruction and generation technologies for creating realistic driving environments [1][3]. Group 1: Scene Reconstruction Work - The publication of StreetGaussian at ECCV2024 marks a significant step in the wave of autonomous driving scene reconstruction [2]. - A large-scale vehicle asset reconstruction dataset named 3DRealCar has been released [2]. - The Balanced3DGS algorithm accelerates 3DGS training by nearly eight times [2]. - The Hierarchy UGP paper is set to be presented at ICCV2025, focusing on autonomous driving scene reconstruction [2]. - StyledStreets introduces a multi-style scene generation algorithm with spatiotemporal consistency for autonomous driving [2]. Group 2: Importance of Scene Reconstruction - Traditional vehicle testing heavily relies on real-world tests, which often fail to replicate many corner cases, and there is a significant domain gap in conventional simulation environments [3]. - The high-fidelity scene reconstruction and editing capabilities of 3DGS make it possible to address these challenges [3]. - The development trajectory of 3DGS is clear: static reconstruction → dynamic reconstruction → hybrid reconstruction → feed-forward GS, with applications extending beyond autonomous driving to 3D fields, embodied intelligence, and the gaming industry [3]. Group 3: 3DGS Learning Path - A comprehensive learning roadmap for 3DGS has been developed, covering point cloud processing, deep learning theories, real-time rendering, and practical coding [5]. - The course titled "3DGS Theory and Algorithm Practical Tutorial" aims to provide a structured approach to mastering the 3DGS technology stack [5]. Group 4: Course Structure - The course consists of six chapters, starting with foundational knowledge in computer graphics and progressing through principles, algorithms, and specific applications in autonomous driving [10][11][12][13][14]. - Each chapter includes practical assignments and discussions on important research directions and industry applications [13][14][15]. Group 5: Target Audience and Outcomes - The course is designed for individuals with a background in computer graphics, visual reconstruction, and programming, aiming to equip them with comprehensive knowledge and skills in 3DGS [19]. - Participants will gain insights into industry demands, pain points, and opportunities for further engagement with academic and industrial peers [15][19].
Momenta智驾方案解析
自动驾驶之心· 2026-01-09 00:47
作者 | 高毅鹏@知乎 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1981658979764568316 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文已获转载授权,转载请联系原文作者 Momenta 实现方案 Momenta 的无地图解决方案通过以下步骤实现自动驾驶功能: 数据采集与传感器输入: 感知处理: 定位计算: 路径规划与控制: 车辆配备多摄像头、激光雷达、雷达、IMU、轮速传感器和 GNSS 接收器。这些传感器持续采集环境数据和车辆状态数据。 多摄像头覆盖 360 度视野,激光雷达和雷达提供点云数据,用于构建 3D 环境模型。 感知模块接收传感器数据,使用计算机视觉和深度学习算法进行物体检测、分类和跟踪。同时,它融合多传感器数据生成局部地图,包括可行驶区域、车道线和 障碍物位置。 局部地图是实时更新的,反映了当前环境的动态变化。 定位模块融合 IMU、轮速和 GNSS 数据,通过滤波和优化算法(如 SLAM 同时定位与地图构建)计算车辆 ...