DriveVLM

Search documents
西交利物浦&港科最新!轨迹预测基座大模型综述
自动驾驶之心· 2025-09-24 23:33
摘要与引言 这篇综述探讨了将大语言模型(LLMs)和多模态大语言模型(MLLMs)等大型基础模型应用于自动驾驶轨迹预测的新范式 。这种方法通过整合语言 和情境知识,使自动驾驶系统能更深入地理解复杂的交通场景,从而提升安全性和效率。文章回顾了从传统方法到由 LFM 引入的范式转变,涵盖了车 辆和行人的预测任务、常用的评估指标和相关数据集 。它详细介绍了LLM的三种关键应用方法: 轨迹-语言映射、多模态融合和基于约束的推理 ,这 些方法显著提高了预测的可解释性和在长尾场景中的鲁棒性 。尽管LLM有诸多优势,但也面临计算延迟、数据稀缺和真实世界鲁棒性等挑战 。 图1展示了自动驾驶中"感知-预测-规划与控制"的闭环过程,突出了LFM如何帮助自动驾驶车辆预测其他交通参与者的轨迹 。 论文链接:https://www.arxiv.org/abs/2509.10570 作者单位:西交利物浦大学,澳门大学,利物浦大学,香港科技大学(广州) 图2则以时间线形式展示了轨迹预测方法的演变,从基于物理模型、机器学习、深度学习到最新的LFM方法 。 轨迹预测概述 轨迹预测是自动驾驶的核心技术,它利用历史数据(如位置和速度)以及上下文信 ...
机器人操控新范式:一篇VLA模型系统性综述 | Jinqiu Select
锦秋集· 2025-09-02 13:41
Core Insights - The article discusses the emergence of Vision-Language-Action (VLA) models based on large Vision-Language Models (VLMs) as a transformative paradigm in robotic manipulation, addressing the limitations of traditional methods in unstructured environments [1][4][5] - It highlights the need for a structured classification framework to mitigate research fragmentation in the rapidly evolving VLA field [2] Group 1: New Paradigm in Robotic Manipulation - Robotic manipulation is a core challenge at the intersection of robotics and embodied AI, requiring deep understanding of visual and semantic cues in complex environments [4] - Traditional methods rely on predefined control strategies, which struggle in unstructured real-world scenarios, revealing limitations in scalability and generalization [4][5] - The advent of large VLMs has provided a revolutionary approach, enabling robots to interpret high-level human instructions and generalize to unseen objects and scenes [5][10] Group 2: VLA Model Definition and Classification - VLA models are defined as systems that utilize a large VLM to understand visual observations and natural language instructions, followed by a reasoning process that generates robotic actions [6][7] - VLA models are categorized into two main types: Monolithic Models and Hierarchical Models, each with distinct architectures and functionalities [7][8] Group 3: Monolithic Models - Monolithic VLA models can be implemented in single-system or dual-system architectures, integrating perception and action generation into a unified framework [14][15] - Single-system models process all modalities together, while dual-system models separate reflective reasoning from reactive behavior, enhancing efficiency [15][16] Group 4: Hierarchical Models - Hierarchical models consist of a planner and a policy, allowing for independent operation and modular design, which enhances flexibility in task execution [43] - These models can be further divided into Planner-Only and Planner+Policy categories, with the former focusing solely on planning and the latter integrating action execution [43][44] Group 5: Advancements in VLA Models - Recent advancements in VLA models include enhancements in perception modalities, such as 3D and 4D perception, as well as the integration of tactile and auditory information [22][23][24] - Efforts to improve reasoning capabilities and generalization abilities are crucial for enabling VLA models to perform complex tasks in diverse environments [25][26] Group 6: Performance Optimization - Performance optimization in VLA models focuses on enhancing inference efficiency through architectural adjustments, parameter optimization, and inference acceleration techniques [28][29][30] - Dual-system models have emerged to balance deep reasoning with real-time action generation, facilitating smoother deployment in real-world scenarios [35] Group 7: Future Directions - Future research directions include the integration of memory mechanisms, 4D perception, efficient adaptation, and multi-agent collaboration to further enhance VLA model capabilities [1][6]
给自动驾驶感知工程师的规划速成课
自动驾驶之心· 2025-08-08 16:04
Core Insights - The article discusses the evolution and importance of planning modules in autonomous driving, emphasizing the need for engineers to understand both traditional and machine learning-based approaches to effectively address challenges in the field [5][8][10]. Group 1: Importance of Planning - Understanding planning is crucial for engineers, especially in the context of autonomous driving, as it allows for better service to downstream customers and enhances problem-solving capabilities [8][10]. - The transition from rule-based systems to machine learning systems in planning will likely see a coexistence of both methods for an extended period, with a gradual shift in their usage ratio from 8:2 to 2:8 [8][10]. Group 2: Planning System Overview - The planning system in autonomous vehicles is essential for generating safe, comfortable, and efficient driving trajectories, relying on inputs from perception outputs [11][12]. - Traditional planning modules consist of global path planning, behavior planning, and trajectory planning, with behavior and trajectory planning often working in tandem [12]. Group 3: Challenges in Planning - A significant challenge in the planning technology stack is the lack of standardized terminology, leading to confusion in both academic and industrial contexts [15]. - The article highlights the need for a unified approach to behavior planning, as the current lack of consensus on semantic actions limits the effectiveness of planning systems [18]. Group 4: Planning Techniques - The article outlines three primary tools used in planning: search, sampling, and optimization, each with its own methodologies and applications in autonomous driving [24][41]. - Search methods, such as Dijkstra and A* algorithms, are popular for path planning, while sampling methods like Monte Carlo are used for evaluating numerous options quickly [25][32]. Group 5: Industrial Practices - The article discusses the distinction between decoupled and joint spatiotemporal planning methods, with decoupled solutions being easier to implement but potentially less optimal in complex scenarios [52][54]. - The Apollo EM planner is presented as an example of a decoupled planning approach, which simplifies the problem by breaking it into two-dimensional issues [56][58]. Group 6: Decision-Making in Autonomous Driving - Decision-making in autonomous driving focuses on interactions with other road users, addressing uncertainties and dynamic behaviors that complicate planning [68][69]. - The use of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP) frameworks is essential for handling the probabilistic nature of interactions in driving scenarios [70][74].
VLM岗位面试,被摁在地上摩擦。。。
自动驾驶之心· 2025-07-12 12:00
Core Viewpoint - The article discusses the advancements and applications of large models in autonomous driving, particularly focusing on the integration of multi-modal large models in the industry and their potential for future development [2][4][17]. Group 1: Interview Insights - The interview process for a position at Li Auto involved extensive discussions on large models, including their foundational concepts and practical applications in autonomous driving [2][4]. - The interviewer emphasized the importance of private dataset construction and data collection methods, highlighting that data remains the core of business models [4][6]. Group 2: Course Overview - A course on multi-modal large models is introduced, covering topics from general multi-modal models to fine-tuning techniques, ultimately focusing on end-to-end autonomous driving applications [5][9][11]. - The course structure includes chapters on the introduction to multi-modal large models, foundational modules, general models, fine-tuning techniques, and specific applications in autonomous driving [9][11][17]. Group 3: Technical Focus - The article outlines the technical aspects of multi-modal large models, including architecture, training paradigms, and the significance of fine-tuning techniques such as Adapter and LoRA [11][15]. - It highlights the application of these models in autonomous driving, referencing algorithms like DriveVLM, which is pivotal for Li Auto's end-to-end driving solutions [17][19]. Group 4: Career Development - The course also addresses career opportunities in the field, discussing potential employers, job directions, and the skills required for success in the industry [19][26]. - It emphasizes the importance of having a solid foundation in deep learning and model deployment, along with practical coding skills [27].
基于VLM的快慢双系统自动驾驶 - DriveVLM解析~
自动驾驶之心· 2025-06-27 09:15
基于此DriveVLM主要有以下几个创新点: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 近一年来,大模型的发展突飞猛进,大模型应用于各个下游任务的工作也层出不穷,今天和为大家分享清华&理想将大模型应用在自动驾 驶领域的一次尝试与探索,也是去年理想快慢双系统(E2E+VLM)的核心算法,利用大模型强大的few-shot能力,期望解决实际驾驶场景 中的长尾问题,提升自动驾驶系统的认知和推理能力。 DriveVLM主要的出发点来自于目前业界自动驾驶遇到的实际困难,随着智能驾驶逐渐从 L2 往 L4 迭代,在实际场景中遇到了各种各样的 长尾问题。这些长尾问题随着数据驱动的方式会逐渐收敛一些,这也是目前业界主流的思路和方法,期待通过数据驱动的方式逐渐毕竟 L4;但是大家随着研究的深入发现,真实场景中的长尾问题是无穷无尽的,只是 case by case 的数据驱动几乎无法进化到真正的 L4 无人驾 驶。因此,工业界和学术界需要进一步思考自动驾驶的下一代方案。 而数据集构建可以说是这篇工作最核心的内容,主要聚集自动驾驶场景关心的五个维度,下面一一展开介绍: Ch ...
体验向上价格向下,端到端加速落地
HTSC· 2025-03-02 07:30
Investment Rating - The report maintains a rating of "Buy" for several companies in the automotive sector, including XPeng Motors, Li Auto, BYD, SAIC Motor, Great Wall Motors, and Leap Motor [10]. Core Viewpoints - The report emphasizes that by 2025, advanced intelligent driving (high-level AD) will see improved user experience and reduced prices, transitioning from a trial phase to widespread adoption among consumers [14][20]. - The penetration rates for L2.5 and L2.9 intelligent driving are projected to reach 3.5% and 10.1% respectively by November 2024, with expectations of further growth to 16% for highway NOA and 14% for urban NOA by 2025 [14][24]. - The report highlights the shift towards end-to-end architecture in intelligent driving systems, which allows for higher performance limits and seamless data transmission, enhancing the overall driving experience [30][31]. Summary by Sections Investment Recommendations - The report suggests focusing on companies with strong engineering capabilities and advantages in data, computing power, and funding, such as XPeng Motors, Li Auto, and BYD, as well as third-party suppliers like Desay SV and Kobot [5][10]. Market Trends - The report notes that the intelligent driving market is evolving, with a focus on enhancing user experience through features like "human-like" driving capabilities and the implementation of end-to-end architectures [14][20]. - The price of high-level intelligent driving systems is expected to decrease significantly, with current models priced below 100,000 and 150,000 yuan for highway and urban NOA respectively [24][28]. Technological Developments - The report discusses the advancements in end-to-end architecture, which is gaining traction among automotive manufacturers, allowing for improved data processing and decision-making capabilities [30][31]. - It also mentions the importance of AI-driven models and the need for automotive companies to adapt their organizational structures to support these technological shifts [15][41]. Competitive Landscape - The report outlines the competitive dynamics among leading automotive companies, highlighting their respective advancements in intelligent driving technologies and the rapid iteration of their systems [41][45]. - Companies like Tesla, Li Auto, and XPeng Motors are noted for their significant investments in R&D and their ability to push updates and improvements quickly [42][46].