Workflow
自动驾驶之心
icon
Search documents
VLA的论文占据自动驾驶前沿方向的主流了。。。
自动驾驶之心· 2025-09-19 16:03
从今年各个CV与AI顶会来看,VLA及其相关衍生方向,已经成为自动驾驶公司和高校实验室的主攻方向,占据了自驾前沿方向近一半的产出。特别是推理增强VLA、强 化学习、相关benchmark等等。 想象一下, 如果能通过语言下达指令(找到最近的星巴克),并且车辆能够丝滑的行车&泊车,是一件多么幸福的事情! VLA打破了传统方法的单任务局限,使得自动驾驶车辆能够在多样化的场景中自主决策,灵活应对未见过的环境!VLA更加直白和干净,很多方法也取消了传统端到端的 复杂的3D感知任务。借鉴VLM更强大的通用泛化能力,除了任务更简洁,VLA更重要的还是提供了一种解决corner case的可能性。 而随着学术界和工业界的目光投向端到端这个技术领域,我们发现了很多问题。自动驾驶VLA的技术栈仍然没有收敛!一系列算法如雨后春笋般冒出: 技术栈多?入门困难? 前一段时间我们推出了《端到端与VLA自动驾驶小班课》,这门课侧重在端到端自动驾驶的技术栈梳理,同学们的反馈很好。 所以很多同学联系自动驾驶之心想学习更多 关于VLA的前沿知识! 因此自动驾驶之心联合清华大学的教研团队共同打造了《自动驾驶VLA实战教程》 ,针对自动驾驶VLA ...
刚刚,李飞飞空间智能最新成果!3D世界生成进入「无限探索」时代
自动驾驶之心· 2025-09-19 16:03
Core Viewpoint - The article discusses the launch of Marble, a new spatial intelligence model by World Labs, which allows users to generate persistent, navigable 3D worlds from a single image or text prompt, marking a significant advancement in large-scale 3D generation technology [4][5][21]. Group 1: Product Features - Marble enables the creation of expansive 3D environments that are permanent and free to explore, distinguishing it from other models like Google's Genie [9][21]. - Users can generate 3D worlds with improved geometric structure and style diversity, allowing for a richer and more complex 3D experience compared to previous technologies [21][24]. - The model supports seamless integration of generated worlds into web-based 3D experiences, utilizing the open-source rendering library Spark for efficient performance across various devices, including VR headsets [21][24]. Group 2: User Experience - The generated 3D worlds allow for free navigation in a browser without any cost, providing a more immersive experience than traditional depth maps or point clouds [24]. - Users can combine multiple generated results to create larger, cohesive environments, enhancing the potential for creative applications [22][31]. - The model's ability to transform various styles into 3D worlds enables users to iterate on appearance and style, catering to diverse creative needs [25][26]. Group 3: Community Feedback - Initial user tests have shown positive results, with suggestions for improvements such as connecting different generated worlds more easily [14][21]. - The community's engagement highlights the excitement around the potential applications of Marble in various creative and technical fields [10][14].
2025年自动驾驶公司一览表
自动驾驶之心· 2025-09-19 16:03
Core Viewpoint - The autonomous driving industry is undergoing a new round of reshuffling and resource integration, with various companies striving to achieve Level 3 (L3) automation as the next technological breakthrough [1]. New Forces - Companies such as NIO, Xpeng, Li Auto, Xiaomi, Leap Motor, Didi, WM Motor, and others are emerging as new players in the autonomous driving sector [2][3]. Tier 1 Suppliers - Major Tier 1 suppliers include Huawei, Baidu, DJI, ZTE, Tencent, and others, focusing on smart cockpits, high-precision maps, and simulation toolchains [5]. Robotaxi - Key players in the Robotaxi segment include Baidu, Pony.ai, and Didi, among others, indicating a competitive landscape in autonomous ride-hailing services [7]. Robotruck - Companies involved in the Robotruck sector include Zhijia Technology, Yincheng Technology, and others, highlighting the growth of autonomous trucking solutions [9]. Robobus - Notable companies in the Robobus category include Baidu, Pony.ai, and SenseTime, showcasing advancements in autonomous bus services [11]. Logistics and Delivery - Major players in logistics and delivery automation include Meituan, Alibaba Damo Academy, JD.com, and others, reflecting the integration of autonomous technology in supply chain operations [13]. Traditional OEMs - Established original equipment manufacturers (OEMs) such as SAIC, GAC, BYD, and others are also investing in autonomous driving technologies [15]. Agricultural Autonomous Driving - Companies like Fengjiang Intelligent and Zhonglian Heavy Industry are focusing on agricultural applications of autonomous driving [17]. Mining Autonomous Driving - Key players in mining automation include Yikong Zhijia and Taga Zhixing, indicating the sector's interest in autonomous solutions [19]. Sanitation Autonomous Driving - Companies such as Zhixingzhe and Koo Wah are developing autonomous solutions for sanitation services [21]. Intelligent Parking - Major players in intelligent parking solutions include Baidu, Desay SV, and others, reflecting the growing need for automated parking systems [23]. Computing Platforms - Companies like Huawei and Horizon Robotics are providing computing platforms essential for autonomous driving technologies [24]. High-Precision Mapping - Key players in high-precision mapping include Baidu, AutoNavi, and Tencent, which are crucial for the development of autonomous navigation systems [25]. Conclusion - The autonomous driving industry is characterized by continuous technological evolution and the collaborative efforts of numerous stakeholders, with the journey towards L3 automation being a collective endeavor [26].
毕竟,没有数据闭环的端到端/VLA只是半成品
自动驾驶之心· 2025-09-19 11:24
Core Viewpoint - The future of autonomous driving technology will focus on safer driving, better user experience, and comprehensive scenario coverage, necessitating a robust operational model from both manufacturers and suppliers [1]. Group 1: Data-Driven Technology - Future autonomous driving companies are expected to resemble "data-driven technology companies," where competition will shift from algorithms to the efficiency of data loops [2]. - The ability to quickly collect, clean, label, train, and validate data will be crucial for gaining a competitive edge, requiring advanced automation tools and AI-driven data pipelines [2]. - The architecture involving VLA/VLM will be essential for enhancing user experience, with a focus on building robust, efficient, and low-cost closed-loop simulations [2]. Group 2: Algorithm and Data Services - When considering algorithms, the supporting data services and automated labeling infrastructure must also be taken into account, especially for companies under profit pressure [3]. - The industry is exploring solutions like DiffVLA to transition smoothly into the VLA era while leveraging existing data and tools [3]. - Current research focuses on introducing new data sources and learning paradigms, indicating that the field remains open for exploration and innovation [3]. Group 3: Simulation and Training - There is a consensus in academia and industry on the importance of closed-loop systems involving agent simulators, sensor simulators, and driving policies [4]. - Companies that can effectively address the sim-to-real domain gap and build efficient closed-loop training systems will likely lead the autonomous driving market [4]. - Without a data loop, end-to-end/VLA systems are considered incomplete [5]. Group 4: Community and Knowledge Sharing - The "Autonomous Driving Knowledge Planet" community aims to provide a platform for technical exchange and problem-solving among members from leading universities and companies in the autonomous driving sector [12]. - The community has compiled extensive resources, including over 40 technical routes and numerous datasets, to facilitate learning and application in projects [12]. - Regular discussions with industry leaders on trends and challenges in autonomous driving are part of the community's offerings [12].
一个P7,从自驾到具身的转行建议......
自动驾驶之心· 2025-09-19 00:30
一个P7,从自驾到具身的转行思路...... 最近和一个P7的朋友聊天,去某大厂的具身实验室做负责人了。因为刚搭建,很多东西不是很成熟,和自 驾组建的时候非常像。缺数据、缺算力和设备。回顾自驾的种种,现在转具身之后,发现很多问题依然是 相似的,自驾优化的那套方法论甚至拿来就可以直接用,只是面向的对象和因素变了。他谈到了几个观点 蛮有意思,希望可以对大家有一定启发。 关于数据 没数据或数据少,第一时间和想到了real2sim2real方案或者sim2real方案。本体有,但数据少采集成本高, 能否使用自采集方式。让机器人自己采集数据并记录,通过算法来筛选和提出dirty数据。这一点和自驾的 数据闭环和自动标注比较相似。 关于算法 如果要商业化,最新的技术应该往后靠,等待技术的成熟。当前已经验证的技术应该被优先推上去,解决 部分问题,满足部分场景和功能的需求。就像VLA,用在智驾和机械臂上都还好,如果上人形,难度会非 常大。强化的方式,依然work,那么就应该使用这种方案。 如果算法和数据都更smooth,人形vla就是时候上了。 部署的一些思路 不用太担心部署问题,我们很擅长做轻量化和部署,算力索尔我觉得基本够 ...
上交严骏驰团队:近一年顶会顶刊硬核成果盘点
自动驾驶之心· 2025-09-18 23:33
Core Insights - The article discusses the groundbreaking research conducted by Professor Yan Junchi's team at Shanghai Jiao Tong University, focusing on advancements in AI, robotics, and autonomous driving [2][32]. - The team's recent publications in top conferences like CVPR, ICLR, and NeurIPS highlight key trends in AI research, emphasizing the integration of theory and practice, the transformative impact of AI on traditional scientific computing, and the development of more robust, efficient, and autonomous intelligent systems [32]. Group 1: Recent Research Highlights - The paper "Grounding and Enhancing Grid-based Models for Neural Fields" introduces a systematic theoretical framework for grid-based neural field models, leading to the development of the MulFAGrid model, which achieves superior performance in various tasks [4][5]. - The "CR2PQ" method addresses the challenge of cross-view pixel correspondence in dense visual representation learning, demonstrating significant performance improvements over previous methods [6][7]. - The "BTBS-LNS" method effectively tackles the limitations of policy learning in large neighborhood search for mixed-integer programming (MIP), showing competitive performance against commercial solvers like Gurobi [8][10][11]. Group 2: Performance Metrics - The MulFAGrid model achieved a PSNR of 56.19 in 2D image fitting tasks and an IoU of 0.9995 in 3D signed distance field reconstruction tasks, outperforming previous grid-based models [5]. - The CR2PQ method demonstrated a 10.4% mAP^bb and 7.9% mAP^mk improvement over state-of-the-art methods after only 40 pre-training epochs [7]. - The BTBS-LNS method outperformed Gurobi by providing a 10% better primal gap in benchmark tests within a 300-second cutoff time [11]. Group 3: Future Trends in AI Research - The research indicates a shift towards a deeper integration of theoretical foundations with practical applications in AI, suggesting a future where AI technologies are more robust and capable of real-world applications [32]. - The advancements in AI research are expected to lead to smarter robots, more powerful design tools, and more efficient business solutions in the near future [32].
纯视觉最新SOTA!AdaThinkDrive:更灵活的自动驾驶VLA思维链(清华&小米)
自动驾驶之心· 2025-09-18 23:33
Core Viewpoint - The article discusses the limitations of existing Chain-of-Thought (CoT) reasoning methods in Vision-Language-Action (VLA) models for autonomous driving, particularly in simple scenarios where they do not improve decision quality and introduce unnecessary computational overhead. It introduces AdaThinkDrive, a new VLA framework that employs a dual-mode reasoning mechanism inspired by the "fast and slow thinking" theory, allowing the model to adaptively choose when to reason based on scene complexity [3][4][10]. Group 1: Introduction and Background - The shift from traditional modular approaches to end-to-end architectures in autonomous driving systems is highlighted, noting that while modular methods offer flexibility, they suffer from information loss between components, leading to cumulative errors in complex scenarios. End-to-end methods mitigate this issue but are still limited by their reliance on supervised data [7]. - The article categorizes current VLA methods into two paradigms: meta-action methods focusing on high-level guidance and planning-based methods that predict trajectories directly from raw inputs. The application of CoT techniques is becoming more prevalent, particularly in complex scenarios, but their effectiveness in simple scenarios is questioned [14][15]. Group 2: AdaThinkDrive Framework - AdaThinkDrive is proposed as an end-to-end VLA framework that incorporates a "fast answer/slow thinking" mechanism, allowing the model to switch adaptively between direct prediction and explicit reasoning based on scene complexity. This is achieved through a three-stage adaptive reasoning strategy [11][18]. - The framework's performance is validated through extensive experiments on the Navsim benchmark, achieving a Predictive Driver Model Score (PDMS) of 90.3, which is 1.7 points higher than the best pure visual baseline model. The model demonstrates superior adaptive reasoning capabilities, selectively enabling CoT in 96% of complex scenarios and defaulting to direct trajectory prediction in 84% of simple scenarios [4][18][50]. Group 3: Experimental Results and Analysis - The article presents a comprehensive evaluation of AdaThinkDrive against existing models, showing that it outperforms both "always think" and "never think" baseline models, with PDMS improvements of 2.0 and 1.4 points, respectively. Additionally, the reasoning time is reduced by 14% compared to the "always think" baseline, indicating a balance between accuracy and efficiency [4][18][58]. - The results indicate that the optimal reasoning strategy is not universal but depends on scene complexity, emphasizing the need for models to adaptively enable reasoning based on the context [10][18]. Group 4: Conclusion - The article concludes that reasoning in simple scenarios often increases computational costs without enhancing decision quality. AdaThinkDrive addresses this by allowing agents to learn when to think, guided by an adaptive thinking reward mechanism. The experimental results on the NAVSIM benchmark demonstrate that AdaThinkDrive achieves state-of-the-art performance, underscoring the importance of adaptive thinking for accurate and efficient decision-making in autonomous driving systems [66].
当前的自动驾驶VLA,还有很多模块需要优化...
自动驾驶之心· 2025-09-18 11:00
点击咨询匹配大牛导师 1. 传统模块化架构的时代: 早期的自动驾驶系统(L2-L4级)普遍采用模块化设计。每个模块(如 物体检测、轨迹预测、路径规划)被独立开发和优化。 优势: 逻辑清晰,各模块可独立调试和 验证,具有较好的可解释性。 瓶颈: 错误累积效应: 上游模块的微小误差会逐级传递并放大, 影响最终决策。 信息损失: 在模块间传递的结构化数据(如3D框、轨迹点)会损失原始传感器 信息中的丰富细节。 规则的局限性: 依赖大量人工设计的规则和参数,难以应对复杂、长尾的 交通场景(Corner Cases)。 2. 纯视觉端到端(模仿学习)的兴起: 以NVIDIA的DAVE-2、Wayve等为代表,研究者们尝试使用 深度神经网络,通过模仿学习(Imitation Learning)的方式,直接从人类驾驶员的驾驶视频和操 作数据中学习"像素到行为"的映射。 优势: 简化了系统架构,能从数据中自动学习复杂的驾驶 策略,无需繁琐的规则设计。 瓶颈: "黑箱"问题与可解释性差: 模型决策过程不透明,难以理 解其做出特定行为的原因,这对于安全至关重要的自动驾驶是致命缺陷。 因果混淆(Causal VLA绝对是今年自动驾 ...
千万美元奖金!2077AI启动Project EVA,邀全球超人挑战AI认知极限
自动驾驶之心· 2025-09-18 11:00
Core Insights - The 2077AI Open Source Foundation has launched Project EVA, a global AI evaluation challenge with a total prize pool of $10.24 million, aimed at exploring the true capabilities of large language models (LLMs) [1][2] - The project seeks to move beyond traditional AI benchmarks to a new paradigm that tests AI's limits in complex logic, deep causality, counterfactual reasoning, and ethical dilemmas [1] - Participants are encouraged to design insightful "extreme problems" to challenge the cognitive blind spots of current leading AI models [1][2] Group 1 - Project EVA is not a programming competition but a trial of wisdom and creativity, focusing on defining the future of AI through innovative problem design [1][2] - The initiative invites top AI researchers, algorithm engineers, and cross-disciplinary experts from fields like philosophy, linguistics, and art to participate [2] - The project emphasizes the importance of a global community in driving disruptive ideas and advancing AI technology [2][3] Group 2 - The registration for Project EVA is now open, allowing participants to secure their spots and receive updates on competition rules, evaluation standards, and schedules [2] - The 2077AI Open Source Foundation is a non-profit organization dedicated to promoting high-quality data openness and cutting-edge AI research [3] - The foundation believes that openness, collaboration, and sharing are essential for the healthy development of AI technology [3]
科研论文这件事,总是开窍后已太晚......
自动驾驶之心· 2025-09-18 03:40
还在等导师"喂饭"?还在想"基础打好再发"?醒醒!科研开窍要趁早,拒信和延毕不会等你准备 好! 看到"延毕"两个字,是不是心里一紧?每年,都有不少才华横溢的硕士,明明能力不差,却卡在 了"论文"这道坎上。不是不努力,而是"开窍"太晚。 "开窍"晚的典型画像 "等导师安排"型: 总觉得导师没给明确方向/任务,自己就无从下手。被动等待,时间悄然流逝。 "追求完美"型: 总想"学完所有知识"、"打好完美基础"、"做出惊天成果"再开始写。结果?基础 永远学不完,实验永远不完美。 "畏难拖延"型: 一想到读文献、调模型、写论文、被拒稿就头大,下意识逃避,用课程、项目甚 至游戏来麻痹自己。 "低估周期"型: 天真地以为写论文、投稿、修改、接收是几个月就能搞定的事情。殊不知,从idea 到接收,动辄半年到一年甚至更久!审稿被拒?周期再加倍! 科研"开窍"的核心是什么? 核心就四个字: 尽早行动! 把"发论文"当成 贯穿硕士生涯 的核心目标,而非最后冲刺的任务。 算一笔"时间账": 研一暑假开始投入:你有近2年时间打磨1-2篇高质量论文(含投稿周期),游刃有余。 研二下才开始着急:留给你的有效时间可能不足1年,还要面临课程、 ...