Workflow
自动驾驶之心
icon
Search documents
这家倒闭新势力宣布复活!
自动驾驶之心· 2025-09-08 03:32
复活了!9月6日,威马汽车通过官方公众号发布《致供应商白皮书》。 目前,深圳翔飞汽车销售有限公司已正式接管威马汽车,正全力推进威马EX5与E5车型在温州基地的快速 复产,同时计划未来5年向市场推出10款以上新产品,并挑战年产100万辆的目标。 最近,国内汽车行业正在从短暂的低谷走出来。 比如蔚来正迎来新的拐点,零跑的增长还在持续,小鹏汽 车八月份的销量也同比增长了169%!智能驾驶的道路远未结束,最近行业内也掀起了新一轮的技术探讨: VLA还是WA,L3的技术路线出现了分歧,我们还有很多很多事情可以做。 这也是我们一直坚持做自动驾驶社区的原因! 三年期间社区内部一直聚焦在自动驾驶最前沿的技术方向,多模态大模型、VLM、VLA、闭环仿真、世界 模型、扩散模型、端到端自动驾驶、规划控制、多传感器融合等 近40个技术方向的内容 。涵盖了目前所有 主流的方向,并形成了技术路线,适合入门进阶的同学做进一步提升。 社区成员主要分布在头部的自驾/具身/互联网公司、Top高校实验室、还有一些传统的机器人公司。形成工 业界+学术界互补的态势。如果您真的有需要,想要做系统提升、和更多的同行业人员交流,欢迎加入。开 学季大额优惠, ...
从近1000篇工作中,看具身智能的技术发展路线!
自动驾驶之心· 2025-09-07 23:34
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 每次再聊具身智能时,总能看到很多paper一直说 "突破、创新"。但很少完整地把整个技术路线串起来,让大家清晰地 知道具身是怎么发展的?碰到了哪些问题?未来的走向是怎么样的? 机器人操作如何让机械臂精准 "模仿" 人类?多模态融合怎样让智能体 "身临其境"?强化学习如何驱动系统自主进化? 遥操作与数据采集又怎样打破空间限制?这些具身智能的关键内容需要我们认真梳理下。 今天我们将会为大家带来领域里比较丰富的几篇研究综述,带你拆解这些方向的发展逻辑。 机器人操作相关 参考论文: The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey 论文链接:https://arxiv.org/abs/2507.11840 作者单位: 浙 ...
不及预期的diffusion多模态轨迹输出,能否胜任自动驾驶VLA的角色?
自动驾驶之心· 2025-09-07 23:34
Core Viewpoint - The article discusses the evolution and current state of autonomous driving paradigms, focusing on the transition from end-to-end systems to Vision-Language-Action (VLA) frameworks and the challenges faced in achieving effective multi-modal trajectory outputs [2][3][11]. Group 1: End-to-End Systems - The end-to-end autonomous driving network directly maps raw sensor inputs to control commands, eliminating traditional processing steps and maximizing information retention [4]. - Iterative practices in engineering involve clustering bad cases and retraining models, but this often leads to new issues arising from updates [8]. - Tesla's "daily update model" offers a solution by continuously evolving the model through the integration of bad cases into training samples [9]. Group 2: Emergence of Dual Systems - The introduction of large language models (LLMs) has led to the rapid adoption of the "end-to-end + VLM" dual system approach, which enhances generalization in zero-shot and few-shot scenarios [11]. - Early VLMs focused on recognizing specific semantics, and the EMMA architecture incorporates reasoning to assist in vehicle control [12]. Group 3: VLA and Diffusion Framework - The VLA framework outputs driving commands that are processed by a diffusion decoder to generate safe and smooth vehicle trajectories [16]. - Current challenges in the VLA + diffusion architecture include subpar multi-modal trajectory outputs, the "brain split" issue between VLA and diffusion systems, and the quality of single-modal trajectories [18][19]. - The alignment of language and action (LA alignment) remains a critical challenge, as the practical value of language models in autonomous driving is still uncertain [19]. Group 4: Future Directions - Future work should focus on scalable system solutions that leverage data advantages and enhance the capabilities of foundational models through reinforcement learning [20][22]. - The "generate + score" paradigm has proven effective in other domains, and the next steps involve optimizing trajectory quality through self-reflection mechanisms [22].
当导师让我去看多模态感知研究方向后......
自动驾驶之心· 2025-09-07 23:34
传统的融合方式主要分为三种:早期融合直接在输入端拼接原始数据,但计算量巨大;中期融合则是在传感器数 据经过初步特征提取后,将不同模态的特征向量进行融合,这是目前的主流方案,例如将所有传感器特征统一到 BEV 视角下进行处理,这解决了不同传感器数据空间对齐的难题,并与下游任务无缝连接;后融合则是每个传 感器独立完成感知,最后在决策层面进行结果融合,可解释性强但难以解决信息冲突。 在这些基础上, 基于Transformer的端到端融合是当前最前沿的方向 。这种架构借鉴了自然语言处理和计算机 视觉领域的成功经验,通过其跨模态注意力机制,能够学习不同模态数据之间的深层关系,实现更高效、更鲁棒 的特征交互。这种端到端的训练方式减少了中间模块的误差累积,能够直接从原始传感器数据输出感知结果,如 3D目标框,从而更好地捕捉动态信息并提升整体性能。 我们了解到, 不少在读的研究生和博士生都在主攻多模态感知融合方向 ,前面我们推出了端到端和VLA方向的 1V6小班课,很多同学也在咨询我们多传感器融合方向,急需大佬辅导...... 模态感知融 科研2 7 课题背景 为克服单一传感器局限,多模态融合技术通过结合 激光雷达、毫米波雷 ...
TrackAny3D:一个模型通吃所有3D单目标跟踪!
自动驾驶之心· 2025-09-07 23:34
来源 | 极市平台 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 导读 TrackAny3D 首次把大规模预训练 3D 点云模型搬进单目标跟踪任务,用轻量适配器+几何专家混合网络,让一套模型无需类别微调即可"通吃"汽车、行人、自 行车等全部类别。新设计的时间令牌与动态掩码加权机制,把静态预训练特征升级为连贯的时序表达,在 KITTI、NuScenes、Waymo 上刷新类别统一设定的 最佳成绩。 01 引言 基于点云的3D SOT是指在动态三维场景中持续定位特定目标的任务。该任务在自动驾驶与移动机器人等多个领域展现出广阔的应用前景。与利用丰富纹理 和色彩信息的RGB图像跟踪方法不同,基于3D雷达的单目标跟踪主要依赖于稀疏且不规则的点云数据来估计目标的三维空间位姿。 论文标题: TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking 这种对几何 ...
自动驾驶黄埔军校,4000人死磕技术的地方~
自动驾驶之心· 2025-09-07 03:08
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 能让学习变得有趣,一定是件了不起的事情。能推动行业发展,成为企业和高校沟通的桥梁,就更伟大了!1个月前,在和朋友聊天的时候说过,我们的愿景是 让AI与自动驾驶走进每个有需要的同学。 自动驾驶之心知识星球,截止到目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。几个运营的小伙伴每天都在复盘,什么样的社区才是大家需要 的?我们有没有什么地方没有考虑到?花拳绣腿的不行、没人交流的也不行、找不到工作的更不行。 于是我们就给大家准备了学术领域最前沿的内容、工业界大佬级别圆桌、开源的代码方案、最及时的求职信息... 星球内部为大家梳理了近40+技术路线,无论你是咨询行业应用、还是要找最新的VLA benchmark、综述和学习入门路线,都能极大缩短检索时间。星球还为大 家邀请了数十位自动驾驶领域嘉宾,都是活跃在一线产业界和工业界的大佬(经常出现的顶会和各类访谈中哦)。欢迎随时提问,他们将会为大家答疑解惑。 我们是一个认真做内容的社区,一个培养未来领袖的地方。 | 0 国内高校著名自动驾驶团队整理 | 5 算法进阶 ...
理想汽车智驾方案World model + 强化学习重建自动驾驶交互环境
自动驾驶之心· 2025-09-06 16:05
Core Viewpoint - The article discusses the integration of World Model and Reinforcement Learning to enhance closed-loop simulation in autonomous driving, aiming to surpass human driving capabilities and improve safety and reliability [3]. Group 1: Limitations and Solutions - Traditional vehicle architectures hinder end-to-end training, leading to ineffective information transfer in reinforcement learning [5]. - The lack of realistic interactive environments has resulted in models that are prone to biases and inaccuracies due to insufficient scene realism and small-scale construction [5]. - The ideal solution combines real data 3D reconstruction with noise addition to train generative models, enhancing their ability to generate diverse scenes [5]. Group 2: DrivingSphere Framework - DrivingSphere is the first generative closed-loop simulation framework that integrates geometric prior information, creating a 4D world representation that combines static backgrounds and dynamic objects [8]. - The framework addresses issues of open-loop simulation lacking dynamic feedback and traditional closed-loop simulation's visual realism and data compatibility [10]. - DrivingSphere consists of three main modules: Dynamic Environment Composition, Visual Scene Synthesis, and Closed-Loop Feedback Mechanism [12]. Group 3: Dynamic Environment Composition - This module constructs a 4D driving world with static backgrounds and dynamic entities, utilizing the OccDreamer diffusion model and action dynamics management [13]. - The 4D world representation is stored in an occupancy grid format, allowing unified modeling of spatial layouts and dynamic agents [16]. Group 4: Visual Scene Synthesis - This module converts 4D occupancy data into high-fidelity multi-view videos, focusing on dual-path conditional encoding and ID-aware representation [19]. - The use of VQVAE for mapping 3D occupancy data enhances reconstruction accuracy through a combination of loss functions [20]. Group 5: Closed-Loop Feedback Mechanism - The closed-loop feedback mechanism enables real-time interaction between the autonomous driving agent and the simulated environment, facilitating a "agent action - environment response" cycle [23]. - This mechanism supports an iterative process of "simulation - testing - optimization," allowing for the identification and correction of algorithmic flaws [23].
自动驾驶中有“纯血VLA"吗?盘点自动驾驶VLM到底能起到哪些作用~
自动驾驶之心· 2025-09-06 16:05
Core Viewpoint - The article discusses the challenges and methodologies involved in developing datasets for autonomous driving, particularly focusing on the VLA (Visual Language Action) model and its applications in trajectory prediction and scene understanding [1]. Dataset Handling - Different datasets have varying numbers of cameras, and the VLM model can handle this by automatically processing different image token inputs without needing explicit camera counts [2] - The output trajectories are based on the vehicle's current coordinate system, with predictions given as relative (x, y) values rather than image coordinates, requiring additional camera parameters for mapping to images [6] - The VLA model's output format is generally adhered to, but occasional discrepancies occur, which are corrected through Python programming for format normalization [8][9] Trajectory Prediction - VLA trajectory prediction differs from traditional methods by incorporating scene understanding capabilities through QA training, enhancing the model's ability to predict trajectories of dynamic objects like vehicles and pedestrians [11] - The dataset construction faced challenges such as data quality issues and inconsistencies in coordinate formats, which were addressed through rigorous data cleaning and standardization processes [14][15] Data Alignment and Structure - Data alignment is achieved by converting various dataset formats into a unified relative displacement in the vehicle's coordinate system, organized in a QA format that includes trajectory prediction and dynamic object forecasting [18] - The input data format consists of images and trajectory points from the previous 1.5 seconds to predict future trajectory points over 5 seconds, adhering to the SANA standard [20] Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community focuses on cutting-edge technologies in autonomous driving, covering nearly 40 technical directions and fostering collaboration between industry and academia [22][24] - The community offers a comprehensive platform for learning, including video tutorials, Q&A sessions, and job opportunities in the autonomous driving sector [28][29]
自动驾驶之心开学季火热进行中,所有课程七折优惠!
自动驾驶之心· 2025-09-06 16:05
Group 1 - The article introduces a significant learning package for the new academic season, including a 299 yuan discount card that offers a 30% discount on all platform courses for one year [3][5]. - Various course benefits are highlighted, such as a 1000 yuan purchase giving access to two selected courses, and discounts on specific classes and hardware [3][6]. - The focus is on cutting-edge autonomous driving technologies for 2025, particularly end-to-end (E2E) and VLA (Vision-Language Alignment) autonomous driving systems [5][6]. Group 2 - End-to-end autonomous driving is emphasized as a core algorithm for mass production, with a notable mention of the competition sparked by the UniAD paper winning the CVPR Best Paper award [6][7]. - The article discusses the challenges faced by beginners in mastering multi-modal large models and the fragmented nature of knowledge in the field, which can lead to discouragement [7][8]. - A course on automated 4D annotation algorithms is introduced, addressing the increasing complexity of training data requirements for autonomous driving systems [11][12]. Group 3 - The article outlines a course on multi-modal large models and practical applications in autonomous driving, reflecting the rapid growth and demand for expertise in this area [15][16]. - It mentions the increasing job opportunities in the field, with companies actively seeking talent and offering competitive salaries [15][16]. - The course aims to provide a systematic learning platform, covering topics from general multi-modal large models to fine-tuning for end-to-end autonomous driving applications [16][18]. Group 4 - The article emphasizes the importance of community and communication in the learning process, with dedicated VIP groups for course participants to discuss challenges and share insights [29]. - It highlights the need for practical guidance in transitioning from theory to practice, particularly in the context of real-world applications and job readiness [29][31]. - The article also mentions the availability of specialized small group courses to address specific industry needs and enhance practical skills [23][24].
谈谈Diffusion扩散模型 -- 从图像生成到端到端轨迹规划~
自动驾驶之心· 2025-09-06 11:59
Core Viewpoint - The article discusses the significance and application of Diffusion Models in various fields, particularly in autonomous driving, emphasizing their ability to denoise and generate data effectively [1][2][11]. Summary by Sections Introduction to Diffusion Models - Diffusion Models are generative models that focus on denoising, where noise follows a specific distribution. The model learns to recover original data from noise through a forward diffusion process and a reverse generation process [1][2]. Applications in Autonomous Driving - In the field of autonomous driving, Diffusion Models are utilized for data generation, scene prediction, perception enhancement, and path planning. They can handle both continuous and discrete noise, making them versatile for various decision-making tasks [11]. Course Overview - The article promotes a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts. The course aims to provide in-depth knowledge of end-to-end algorithms and VLA technology [15][22]. Course Structure - The course is structured into several chapters, covering topics such as: - Comprehensive understanding of end-to-end autonomous driving [18] - In-depth background knowledge including large language models, BEV perception, and Diffusion Model theory [21][28] - Exploration of two-stage and one-stage end-to-end methods, including the latest advancements in the field [29][36] Learning Outcomes - Participants are expected to gain a solid understanding of the end-to-end technology framework, including one-stage, two-stage, world models, and Diffusion Models. The course also aims to enhance knowledge of key technologies like BEV perception and reinforcement learning [41][43].