Workflow
自动驾驶之心
icon
Search documents
3DGS重建!gsplat 库源码解析
自动驾驶之心· 2025-09-23 23:32
作者 | 微卷的大白 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1952449084788029155 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 前两天看到李飞飞 Worldlabs 新工作Mrable的时候,提到后面想多看一看 3DGS / 重建相关的工作。 不过如果真的有小白要踩坑 ,gsplat 的文档和维护其实比gaussian-splatting 要稍微好一些,个人更推荐这个库。 相比3DGS 论文对应的 gaussian-splatting 库,nerfstudio-projectgsplat 是对官方库做了一些优化,可参考https://docs.gsplat.studio/main/migration/migration_inria.html 的 说明。 但是知乎搜了一下发现,讲 3DGS 论文原理、改进的不少,我自己上半年也回顾过cuda kernel 源码:重温经典之 3DGS CUDA 源码解析 ,但是另一个常用的gsplat ...
专攻长尾场景!同济CoReVLA:双阶段端到端新框架
自动驾驶之心· 2025-09-23 23:32
自动驾驶技术在 长尾场景(低频率、高风险的安全关键场景) 中表现仍存在显著短板——这类场景虽不常见,却占自动驾驶事故的很大比例,且会导致驾驶员 接管率急剧上升。 传统模块化自动驾驶系统(感知-预测-规划分阶段)存在"误差累积"问题:各阶段的微小误差会逐步放大,导致整体性能难以提升;而端到端方法直接将传感器 输入映射为控制动作或者自车的轨迹,具备更强的适应性和统一优化能力,被认为是解决长尾场景问题的潜在方向。 而当前端到端方法主要分为两类,但均无法很好应对长尾场景: CoReVLA 核心设计:"Collect-and-Refine"双阶段框架 为解决上述问题,CoReVLA提出 持续学习的双阶段框架 ,通过"数据收集(Collect)"与"行为优化(Refine)"循环,提升长尾场景下的决策能力。整体流程如 figure 1所示,分为预阶段(SFT)、第一阶段(接管数据收集)、第二阶段(DPO优化)三部分。 预阶段:基于QA数据的监督微调(SFT) 此阶段的目标是让VLA模型建立自动驾驶领域的基础认知,为后续长尾场景学习铺垫。 $${\mathcal{L}}_{S F T}=-\sum_{i=1}^{N}\su ...
世界模型能够从根本上解决VLA系统对数据的依赖,是伪命题...
自动驾驶之心· 2025-09-23 11:37
"世界模型能够从根本上解决VLA系统对数据的依赖,是伪命题。" 柱哥这两天和星球大佬讨论VLA和WA的路线之争,分享给大家。 2025年的自动驾驶赛道正分裂为两大阵营:小鹏、理想、元戎启行押注 VLA路线,华为、蔚来则力 推世界行为模型(WA)。后者认为WA才是能真正实现自动驾驶的终极方案。然而血淋淋的现实 是:这不过是个套壳的数据依赖论。 VLA依赖海量数据训练得到的VLM进一步扩展Action的能力,但工业界最得天独厚的优势就是有海 量的数据,这给模型研发提供了无限可能。在普通场景大家都已经做到99.9%的能力下,长尾场景才 是既分高下也决生死的所在。 世界模型为什么会被吹捧,生成式的方法理论上可以无限扩展corner case,但生成的前提是用海量真 实数据训练物理规则认知框架。 你去生成一个卡车在马路上打篮球的场景,理论上虽然可以,但实际上VLA也好,WA也好,都未必 能真正理解。 『自动驾驶之心知识星球』目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类 的自驾社区,已经超过4000人了。 我们期望未来2年内做到近万人的规模。给大家打造一个交流+技 术分享的聚集地,是许多 ...
一汽正式收购大疆卓驭!落下智能驾驶功课的车企们,正在加速补作业...
自动驾驶之心· 2025-09-23 03:44
喧嚣过后,一汽正式成为大疆卓驭第一大股东。 从2016年建立伊始的大疆车载,到2023年拆分独立运营,2024年正式启用 "卓驭" 作为业务品 牌,再到如今被一汽的收购。 9月22日,国家市场监管总局对中国第一汽车股份有限公司收购深圳市卓驭科技有限公司股权一 案进行了公示。 卓驭,前大疆车载事业部。因为走纯视觉的极致性价比路线在自动驾驶红海中杀出了一条路。前 一段时间自动驾驶之心也盘点了大疆卓驭的发展历程。 据公开资料显示,卓驭先后获得比亚迪、上汽、国投招商、基石资本、光远资本等车企与机构的 投资,累计融资已超 25 亿元。 卓驭以低算力、高性价比方案起家,近年拓展至中高端算力平台,并推出激光雷达方案及舱驾一 体技术。5月初的上海车展,卓驭不仅展示了补盲激光雷达知周、激目2.0等全新辅助驾驶硬件产 品,还以实车体验的形式验证了卓驭基于NVIDIA DRIVE Thor平台的旗舰级VLA大模型、基于高 通SA8775P平台的舱驾一体方案等阶段性技术成果。 卓驭的十年,也是自动驾驶黄金发展的十年。 作为大疆孵化的车载业务,引入中国汽车老大哥成为股东,走出了一条华为之外的路。 可以预见,自动驾驶的新局面缓缓拉开了帷 ...
三维重建综述:从多视角几何到 NeRF 与 3DGS 的演进
自动驾驶之心· 2025-09-22 23:34
Core Viewpoint - 3D reconstruction is a critical intersection of computer vision and graphics, serving as the digital foundation for cutting-edge applications such as virtual reality, augmented reality, autonomous driving, and digital twins. Recent advancements in new perspective synthesis technologies, represented by Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved reconstruction quality, speed, and dynamic adaptability [5][6]. Group 1: Introduction and Demand - The resurgence of interest in 3D reconstruction is driven by new application demands across various fields, including city-scale digital twins requiring kilometer-level coverage and centimeter-level accuracy, autonomous driving simulations needing dynamic traffic flow and real-time semantics, and AR/VR social applications demanding over 90 FPS and photo-realistic quality [6]. - Traditional reconstruction pipelines are inadequate for these new requirements, prompting the integration of geometry, texture, and lighting through differentiable rendering techniques [6]. Group 2: Traditional Multi-View Geometry Reconstruction - The traditional multi-view geometry approach (SfM to MVS) has inherent limitations in quality, efficiency, and adaptability to dynamic scenes, which have been addressed through iterative advancements in NeRF and 3DGS technologies [7]. - A comprehensive comparison of various methods highlights the evolution and future challenges in the field of 3D reconstruction [7]. Group 3: NeRF and Its Innovations - NeRF models scenes as continuous 5D functions, enabling advanced rendering techniques that have evolved significantly from 2020 to 2024, addressing issues such as data requirements, texture limitations, lighting sensitivity, and dynamic scene handling [13][15]. - Various methods have been developed to enhance quality and efficiency, including Mip-NeRF, NeRF-W, and InstantNGP, each contributing to improved rendering speeds and reduced memory usage [17][18]. Group 4: 3DGS and Its Advancements - 3DGS represents scenes as collections of 3D Gaussians, allowing for efficient rendering and high-quality output. Recent methods have focused on optimizing rendering quality and efficiency, achieving significant improvements in memory usage and frame rates [22][26]. - The comparison of 3DGS with other methods shows its superiority in rendering speed and dynamic scene reconstruction capabilities [31]. Group 5: Future Trends and Conclusion - The next five years are expected to see advancements in hybrid representations, real-time processing on mobile devices, generative reconstruction techniques, and multi-modal fusion for robust reconstruction [33]. - The ultimate goal is to enable real-time 3D reconstruction accessible to everyone, marking a shift towards ubiquitous computing [34].
急需一台性价比高的3D激光扫描仪!
自动驾驶之心· 2025-09-22 23:34
最强性价比3D激光扫描仪 | 雄厚背景&项目验证 | | --- | 面向工业场景和教研场景的 超高性价比3D扫描仪来了!GeoScan S1是国内目前最强性价比实景三维激光扫描 仪,轻量化设计,一键启动,便可拥有高效实用的三维解决方案。以多模态传感器融合算法为核心,实现厘米级 精度的三维场景实时重构。可广泛用于多种作业领域。 每秒20万级点云成图,70米测量距离,360°全域覆盖,支持20万平米以上的大场景,扫描可选配3D高斯数据采 集模块,实现高保真实景还原。支持跨平台集成,配备高带宽网口及双USB 3.0接口,为科研实验提供灵活扩展 空间。降低开发门槛,助力开发者快速掌握研发能力,开启更多可能。 同济大学刘春教授团队和西北工业大学产业化团队携手合作 多年科研和行业积累,上百个项目验证 GeoScan S1设备自带手持Ubuntu系统和多种传感器设备,手柄集成了电源,可通过D-TAP转XT30母头输出至 GeoScan S1设备本体,给雷达、摄像头以及主控板提供电源。 基础版重建效果一览! 使用门槛低 :操作简单直观,一键启动即可 执行扫描作业 扫描结果导出即用 :无需复杂部署和繁琐处理,扫 描结果导出即 ...
放榜了!NeurIPS 2025论文汇总(自动驾驶/大模型/具身/RL等)
自动驾驶之心· 2025-09-22 23:34
Core Insights - The article discusses the recent announcements from NeurIPS 2025, focusing on advancements in autonomous driving, visual perception reasoning, large model training, embodied intelligence, reinforcement learning, video understanding, and code generation [1]. Autonomous Driving - The article highlights various research papers related to autonomous driving, including "FutureSightDrive" and "AutoVLA," which explore visual reasoning and end-to-end driving models [2][4]. - A collection of papers and codes from institutions like Alibaba, UCLA, and Tsinghua University is provided, showcasing the latest developments in the field [6][7][13]. Visual Perception Reasoning - The article mentions "SURDS," which benchmarks spatial understanding and reasoning in driving scenarios using vision-language models [11]. - It also references "OmniSegmentor," a flexible multi-modal learning framework for semantic segmentation [16]. Large Model Training - The article discusses advancements in large model training, including papers on scaling offline reinforcement learning and fine-tuning techniques [40][42]. - It emphasizes the importance of adaptive methods for improving model performance in various applications [44]. Embodied Intelligence - Research on embodied intelligence is highlighted, including "Self-Improving Embodied Foundation Models" and "ForceVLA," which enhance models for contact-rich manipulation [46][48]. Video Understanding - The article covers advancements in video understanding, particularly through the "PixFoundation 2.0" project, which investigates the use of motion in visual grounding [28][29]. Code Generation - The article mentions developments in code generation, including "Fast and Fluent Diffusion Language Models" and "Step-By-Step Coding for Improving Mathematical Olympiad Performance" [60].
FlowDrive:一个具备软硬约束的可解释端到端框架(上交&博世)
自动驾驶之心· 2025-09-22 23:34
在BEV空间中引入具有物理可解释性的基于能量的增强信息,助力端到端新SOTA! 这两年的端到端算法基于环视BEV表示实现运动规划。在车辆运动规划过程中,自动驾驶需同时考虑两类约束:一类是由几何占据障碍物(如车辆、 行人)带来的 硬约束 ,另一类是无明确几何形态、基于规则的 软语义约束 (如车道边界、交通先验知识)。然而现有端到端框架通常依赖以隐式方 式学习的BEV特征,缺乏对"风险"和"导向先验"的显式建模,难以实现安全且可解释的规划。 为解决这一问题,上交、博世中国、清华AIR和上海大学的团队共同提出 FlowDrive ——其核心是在BEV空间中引入具有物理可解释性的基于能量的流 场(包括风险势场与车道吸引力场),将语义先验和安全线索编码至BEV空间中。这些 流感知特征 能够实现锚定轨迹的自适应优化,并为轨迹生成提 供可解释的导向。此外,FlowDrive通过带有特征级门控的条件扩散规划器,将运动意图预测与轨迹去噪解耦,有效缓解了任务间干扰,提升了多模态 多样性。 在NAVSIM v2基准数据集上的实验表明,FlowDrive实现了当前最优性能,Extended Predictive Driver Mod ...
自驾方向适合去工作、读博还是转行?
自动驾驶之心· 2025-09-22 10:30
Core Viewpoint - The article discusses the decision-making process for individuals in the autonomous driving field regarding whether to pursue a PhD, continue working, or switch careers, emphasizing the importance of foundational knowledge and practical experience in the industry [2][3]. Group 1: Career Decisions - The article highlights two critical questions for individuals considering a career in autonomous driving: the availability of foundational knowledge and practical experience in their current environment, and their readiness to take on pioneering research roles if pursuing a PhD [2][3]. - It points out that many academic mentors may lack deep expertise in autonomous driving, which can hinder students' development if they do not have a solid foundation [2]. - The article suggests that students should assess their preparedness to independently explore and solve problems, especially in cutting-edge research areas where few references exist [2][3]. Group 2: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community is introduced as a resource for beginners, offering a comprehensive platform for learning, sharing knowledge, and networking within the autonomous driving field [3][5]. - The community has over 4,000 members and aims to grow to nearly 10,000 in the next two years, providing a space for technical sharing and job-seeking interactions [3][5]. - Various practical questions and topics are addressed within the community, including entry points for end-to-end systems, multi-modal models, and the latest industry trends [5][16]. Group 3: Learning and Development - The community offers a structured learning system with over 40 technical routes covering various aspects of autonomous driving, including perception, simulation, and planning control [7][14]. - It provides access to numerous resources, including video tutorials, technical discussions, and job opportunities, aimed at both beginners and those looking to advance their skills [8][18]. - The community also facilitates connections with industry leaders and experts, enhancing members' understanding of the latest developments and job market trends in autonomous driving [12][92].
自动驾驶VLA发展到哪个阶段了?现在还适合搞研究吗?
自动驾驶之心· 2025-09-22 08:04
Core Insights - The article discusses the transition in intelligent driving technology from rule-driven to data-driven approaches, highlighting the emergence of VLA (Vision-Language Action) as a more straightforward and effective method compared to traditional end-to-end systems [1][2] - The challenges in the current VLA technology stack are emphasized, including the complexity and fragmentation of knowledge, which makes it difficult for newcomers to enter the field [2][3] - A new practical course on VLA has been developed to address these challenges, providing a structured learning path for students interested in advanced knowledge in autonomous driving [3][4][5] Summary by Sections Introduction to VLA - The article introduces VLA as a significant advancement in autonomous driving, offering a cleaner approach than traditional end-to-end systems, while also addressing corner cases more effectively [1] Challenges in Learning VLA - The article outlines the difficulties faced by learners in navigating the complex and fragmented knowledge landscape of VLA, which includes a plethora of algorithms and a lack of high-quality documentation [2] Course Development - A new course titled "Autonomous Driving VLA Practical Course" has been created to provide a comprehensive overview of the VLA technology stack, aiming to facilitate easier entry into the field for students [3][4] Course Features - The course is designed to address key pain points, offering quick entry into the subject matter through accessible language and examples [3] - It aims to build a framework for understanding VLA research and enhance research capabilities by teaching students how to categorize papers and extract innovative points [4] - The course includes practical components to ensure that theoretical knowledge is effectively applied in real-world scenarios [5] Course Outline - The course covers various topics, including the origins of VLA, foundational algorithms, and the differences between modular and integrated VLA systems [6][15][19][20] - It also includes practical coding exercises and projects to reinforce learning and application of concepts [22][24][26] Instructor Background - The course is led by experienced instructors with a strong background in multi-modal perception, autonomous driving, and large model frameworks, ensuring high-quality education [27] Learning Outcomes - Upon completion, students are expected to have a thorough understanding of current advancements in VLA, core algorithms, and the ability to apply their knowledge in practical settings [28][29]