自动驾驶之心
Search documents
AI Day直播 | “像素级完美”深度感知,NeurIPS高分论文解密
自动驾驶之心· 2025-11-05 00:04
点击按钮预约直播 深度估计是机器人感知、三维重建、AR/VR 等应用的核心。然而,现有的深度估计方法普遍存在边缘飞点(Flying Pixels)问题,而这会导致机器人执行决策时候,引发错误动作;三维重建时导致物体轮廓鬼影重重等。现有方法经历边 缘飞点主要因为以下原因: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 本文提出 Pixel-Perfect Depth (PPD),一种 直接在像素空间进行扩散生成的单目深度估计模型 ,从根源上避免了因 VAE 压缩导致的伪影问题。然而,高分辨率像素空间的扩散建模极具挑战:模型需兼顾 全局语义的一致性 与 局部细节的精确 性 ,否则极易出现结构失真或深度跳变。为此,本文设计了语义引导的扩散 Transformer(SP-DiT),在扩散过程中引入 来自视觉基础模型的高层语义特征作为提示,有效增强了模型对全局结构的把握与细节恢复能力。同时,本文提出一种 判别式模型 (如 Depth Anything v2, Depth Pro )由于回归损失的平滑倾向,容易在深度 ...
理想智驾逆袭往事:端到端的百日冲刺
自动驾驶之心· 2025-11-05 00:04
Core Viewpoint - The article discusses the significant advancements made by Li Auto in the field of autonomous driving, particularly the introduction of the "end-to-end + VLM" system, which marks a turning point for the company in achieving industry leadership [5][7][40]. Group 1: Development of Autonomous Driving Technology - In March 2024, Li Auto's CEO expressed dissatisfaction with the company's autonomous driving progress, emphasizing the need for a shift to an end-to-end approach [4][9]. - The launch of the "end-to-end + VLM" system in July 2024 allowed Li Auto to finally experience true leadership in autonomous driving technology after years of following competitors [5][6]. - By October 2024, the trial driving of the new system accounted for 65% of user experiences in stores, indicating strong market enthusiasm [6][7]. Group 2: Market Performance and User Adoption - In 2024, the delivery share of models equipped with the AD Max system (featuring the new technology) reached 75.4% in the 300,000+ RMB segment and 84.6% in the 400,000+ RMB segment, a significant increase from just 20% earlier in the year [7][51]. - The rapid adoption of the end-to-end system led to a dramatic increase in user interest and sales, with the proportion of users experiencing the system rising to over 70% by the end of the year [51][52]. Group 3: Strategic Shifts and Team Expansion - In 2023, Li Auto began to learn from Huawei's approach to autonomous driving, significantly expanding its engineering team from around 600 to over 1,000 by the end of the year [10][11]. - Despite the team expansion, initial results did not meet expectations, prompting a strategic pivot towards the end-to-end model [11][24]. - The end-to-end project was initiated with a small, dedicated team, emphasizing the importance of voluntary participation and commitment to the project's success [27][28]. Group 4: Technical Innovations and Efficiency - The end-to-end project was completed in approximately 100 days, showcasing an unprecedented speed of development in the industry, with no significant errors reported during the process [46][56]. - The project utilized a one-stage end-to-end technology, integrating various functions into a single network, which allowed for more efficient processing and reduced complexity compared to traditional modular approaches [58][59]. - The success of the project was attributed to effective collaboration among team members and a strong focus on data-driven methodologies, which allowed for high-quality outcomes with a relatively small team [57][64]. Group 5: Data-Driven Approach - The foundation of Li Auto's success in autonomous driving is rooted in a robust data collection and processing system established by the team, which has been in development since 2018 [72][73]. - The company has emphasized the importance of high-quality data over sheer model complexity, leading to significant improvements in performance metrics [70][72].
自动驾驶是否一定需要语言模型?
自动驾驶之心· 2025-11-05 00:04
以下文章来源于焉知汽车 ,作者咖啡鱼 焉知汽车 . 科技 · 创新 作者 | 咖啡鱼 来源 | 焉知汽车 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 一、自动驾驶的路线分野:WEWA 与 VLA 的技术博弈 2025 年成为自动驾驶技术架构的关键分水岭:以华为乾崑智驾 ADS 4 为代表的 WEWA 架构 (世界引擎 + 世界动作模型),与以理想、小鹏等企业竞 逐的 VLA 架构 (视觉 - 语言 - 动作模型)形成鲜明对立。华为 靳玉志表示,走 VLA 技术路线的企业,认为现在大家是通过 Open AI 等各种语言大模 型,把网上的信息学了一遍以后,将语言、所有的学习转换成 LM 的方式掌握知识。这样的路径看似取巧,其实并不是走向真正自动驾驶的路径。 图 华为WEWA架构发布,来自网络 这场争论的核心直指 "大语言模型(LLM)是否为自动驾驶的必需品"——WEWA 以 "去语言化" 实现高效落地,VLA 则以语言模型为核心追求认知智 能,二者的路径选择折射出行业 ...
英伟达一篇长达41页的自驾VLA框架!因果链推理,实车可部署算法Alpamayo-R1
自动驾驶之心· 2025-11-05 00:04
编辑 | 自动驾驶之心 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Yulong Cao等 英伟达许久不见自动驾驶方向的论文工作,昨天直接放了个大招,难得啊。。。 一篇长达41页的自动驾驶VLA框架 — Alpamayo-R1。Alpamayo-R1指出基于模仿学习的端到端架构,在长尾场景中的表现能力很差,这是由于监督信号稀疏并且因 果推理的理解能力不足。另外现有自驾VLA的框架没办法显式约束思维链和决策行为之间的关联,一方面可能出现幻觉的问题,另一方面也没办法保证因果理解的 正确性。举个错误的例子:左转是红灯,但由于直行是绿灯所以允许车辆左转。 为了解决这些问题,Alpamayo-R1将因果链(Chain of Causation)推理与轨迹规划相融合,以提升复杂驾驶场景下的决策能力。本文方法包含三大核心创新: 结果表明,相较于仅基于轨迹的基准模型,AR1在高难度场景下的规划准确率提升高达12%;在闭环仿真中,偏离车道率降低35%,近距离碰撞率降低25%。经强 化学习后训练(RL po ...
跨行转入自动驾驶大厂的经验分享
自动驾驶之心· 2025-11-04 00:03
最近邀请到苹果姐和星友做了一次线上交流,分享给大家。 苹果姐 2020年从国有银行大跨度转行至自动驾驶大厂,后又入职某头部L4创业公司和头部新势力, 研究方向也多次转变:从算法评测开始,又先后从事2D交通红绿灯检测,泊车视觉感知,BEV感 知,端到端主动安全算法等。 无论是转行到自动驾驶,还是之后多次的方向转换,柱哥都学习到很多。我提炼一下有两个关键的 点: 一是机会在面前一定要抓住,付出全力: 2020年转行的时候,投的自驾公司很长时间都没有回信, 最后有一家自驾公司联系苹果姐要求一周后线上机试,苹果姐在没有准备的前提下一周内高强度刷 leetcode最终成功过了机试。也得益于20年自动驾驶扩招,苹果姐也成功转行。 二是先转行再一步步提升,找准赛道: 起初苹果姐从评测开始,虽然不是算法岗但积累了一定的 coding能力。同时借着评测的契机学习了静态感知,之后跳槽也顺利到了感知岗位,然后一步步到 BEV感知再到如今的端到端主动安全,这背后是持续的学习进步和对行业趋势的把握。 最近也有很多同学咨询柱哥方向选择的问题,所以我也是邀请到苹果姐和大家分享这个主题, 直播 回访已经上传到自动驾驶之心知识星球,欢迎大家一 ...
从DriveVLA-W0出发:探讨世界模型如何放大VLA的扩展定律(中科院)
自动驾驶之心· 2025-11-04 00:03
戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 在自动驾驶领域,通过大规模数据来扩展视觉-语言-动作模型,为构建更通用的驾驶智能提供了一条充满前景的道路。然而,VLA模型一直面临" 监督缺失 "的问 题:其庞大的模型能力仅由稀疏、低维的动作信号进行监督,导致其大部分表征潜力未能得到充分利用。 为解决此问题,中科院和华为引望的团队提出了 DriveVLA-W0, 一种利用世界模型来预测未来图像的训练范式。 为验证DriveVLA-W0的通用性,本文在两种主流 VLA架构上展开验证:针对采用离散视觉token的VLA模型,设计自回归世界模型;针对基于连续视觉特征的VLA模型,设计扩散世界模型。基于世界建模学习到的 丰富表征,本文进一步引入轻量级动作专家(action expert),以解决实时部署中的推理耗时问题。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 DriveVLA-W0: 利用世界模型放大VLA的 拓展定律 时间:11.4 / 19:30-20:30 直播简介 VLA模型是通向通用自动驾驶的希望路 径,却受限于"监督赤字": ...
工业界大佬带队!三个月搞定3DGS理论与实战
自动驾驶之心· 2025-11-04 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 在机器视觉领域中,新视角合成的核心目标是通过图像或视频构建可以被计算机处理和理解的3D模型。进而催生了大量的应用,包括3D建模、虚拟现实、自动驾 驶闭环仿真等等。但早期的算法像SfM、MVS受限颇多。直到2020年的NeRF打破了这一僵局,但NeRF仍然面临计算效率和可编辑差的问题。所以23年的3DGS一 经问世便迅速火爆起来。 但3DGS的技术迭代速度远超想象。静态重建3DGS、动态重建4DGS、表面重建2DGS,但per-scene optimization的方法用起来实在不方便,因此进一步催生了feed- forward 3DGS。目前3DGS在学术界和工业界都很吃得开,很多同学想入门却苦于没有有效的学习路线图:既要吃透点云处理、深度学习等理论,又要掌握实时渲 染、代码实战,零散查资料自学往往越学越懵,遇到问题连个请教的人都没有。 为此我们花了两个月的时间设计了一套3DGS的学习路线图,从原理到实战细致 展开。 自动驾驶之心联合 工业界算法专家 开展了这门《3DGS理论与算法实战教程》!课程包含2DGS ...
和一些人交流后, 更深入的分析地平线HSD
自动驾驶之心· 2025-11-04 00:03
Core Viewpoints - The article presents eight key viewpoints regarding the performance and evaluation of autonomous driving technologies, particularly focusing on the comparison between Horizon's HSD and Li Auto's VLA systems [3]. Group 1: Performance Evaluation - The experience with Horizon's HSD during a 1.5-hour test drive was notably better than the current production version of Li Auto's L7 VLA, although future production versions may not match the engineering version's performance [3][5]. - The evaluation of HSD's performance is limited due to the lack of comprehensive safety assessments and the variability of experiences across different locations [3][7]. - The HSD system demonstrated good vertical control, but its performance can vary significantly based on the city and driving conditions [6][7]. Group 2: Technical Comparisons - Horizon employs a VA-style end-to-end approach, while Li Auto utilizes a VLA-style end-to-end system, with the naming being a mere distinction [9][10]. - The VA-style end-to-end system is perceived to have advantages in user experience due to current limitations in computing power and bandwidth faced by the VLA approach [6][12]. - Li Auto's decision to pursue VLA for mass production is seen as a bold move, but it comes with challenges related to resource allocation and the need for higher computational requirements [11][12]. Group 3: Industry Outlook - There is a prevailing belief that many autonomous driving operators will eventually converge in capabilities, with only a few manufacturers able to survive without in-house development of autonomous driving technologies [3][11]. - The article suggests that manufacturers lacking self-research capabilities in autonomous driving may struggle to adapt to the evolving smart vehicle industry [3][11]. - The future landscape of autonomous driving will likely see a concentration of capabilities, with differentiation becoming increasingly important as the industry matures [3][11].
人形机器人大概要进入第一轮寒冬
自动驾驶之心· 2025-11-03 08:55
以下文章来源于天南AI茶馆 ,作者天南 天南AI茶馆 . 全网最有趣的全栈人形机器人博主,擅长给技术圈讲产业,给产业圈讲技术。 作者 | 天南 来源 | 天南AI茶馆 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 最近,见证了人形机器人行业太多的不及预期。很多人问我,从技术角度上来看,人形机器人行业是否要进入一段寒冬了。 今天我们通过理性的逻辑分析,来看目 前行业发展的真实情况。 导读 最近看到了太多的不及预期。 国外公司的表现和大牛预言都不是很乐观: 特斯拉Gen2 因为发热、灵巧手短命的问题,被迫暂停今年的量产计划。而Gen3再次跳票,推迟到明年Q1。 Figure03 本来 万分期待,但 被时代周刊爆出来多次拍摄剪辑。 Meta 首席 AI 科学家 LeCun ,说 机器人行业远未实现真正智能。而 Google deepMind负责人 最近也提到:人形机器人进入家庭市场至少还要5- 10年。 反观国内,倒是有些虚假的繁荣: 订单飞起 ,但被爆出多数为左手倒右手 ...
端到端和VLA,这些方向还适合搞研究
自动驾驶之心· 2025-11-03 00:04
Core Viewpoint - The article discusses the evolution of autonomous driving technology, highlighting the transition from rule-based systems to end-to-end models represented by companies like Ideal and XPeng, and currently to the world model phase represented by NIO, emphasizing the continuous presence of deep learning throughout these changes [1]. Group 1: Course Introduction - The course covers the development from modular production algorithms to end-to-end systems and now to VLA, focusing on core algorithms such as BEV perception, visual language models (VLM), diffusion models, reinforcement learning, and world models [5]. - Participants will gain a comprehensive understanding of the end-to-end technology framework and key technologies, enabling them to reproduce mainstream algorithm frameworks like diffusion models and VLA [5]. - Feedback indicates that students completing the course can achieve approximately one year of experience as end-to-end autonomous driving algorithm engineers, benefiting from the training for internships and job recruitment [5]. Group 2: Instructor Profile - The main instructor, Jason, holds a C9 undergraduate degree and a PhD from a QS top 50 university, with multiple published papers in CCF-A and CCF-B journals [6]. - He is currently an algorithm expert at a leading domestic manufacturer, engaged in the research and production of cutting-edge algorithms, with extensive experience in the development and delivery of autonomous driving perception and end-to-end algorithms [6]. Group 3: Research Guidance - The program aims to enhance practical skills and knowledge in cutting-edge topics, with a focus on helping students publish high-level papers to improve their academic prospects [8]. - The community includes over 300 instructors specializing in autonomous driving and embodied intelligence, with a high manuscript acceptance rate of 96% over the past three years [8]. Group 4: Research Process - The guidance process includes selecting research topics based on student interests, explaining key concepts, and providing essential foundational knowledge and recommended learning materials [11]. - Students will learn how to critically read literature, conduct research, and write various sections of a paper, including methods and experimental results, with continuous feedback and support throughout the process [11].