Workflow
自动驾驶VLA
icon
Search documents
基于模仿学习的端到端决定了它的上限不可能超越人类
自动驾驶之心· 2025-09-24 06:35
基于模仿学习的端到端本质只是在模仿人类,对物理世界的理解并不透彻。 因此VLA提供了这样一种可能,从模仿人类到成为人类。 业内这两年追捧的端到端,标志着智能驾驶从规则驱动向数据驱动的根本转变。但在实际量产中,端到端虽然提供了一个打通上下游视角的能力,但面对复杂的困难场景 仍然受限。如果在自动驾驶公司工作过,就知道量产模型的迭代仍然被限制在无限corner case的循环中。这里也借用李想AI Talk的一段话: " 端到端比较像什么呢?端到端比较像哺动物的智能,比如像马戏团里的一些动物,向人类学习怎么骑自行车。它学了人类的这些行为,人类怎么去做出各种的行为的开 车。但是它对物理世界并不理解,它只是看到了一个什么样的三维的图像,知道自身的速度,并给出了一个什么样的轨迹,所以它应付大部分的泛化是没有问题的,去面 对它从来没有学到的、特别复杂的,其实就会遇到问题。所以这时候我们也会配合,视觉语言模型 VLM,然后放进来。但是我们能够用到的视觉语言模型这些开源的, 用在交通上的能力都非常的有限,所以只能起到一些非常有限的辅助的一个作用。我觉得第二个阶段就是哺乳动物智能运作的一个方式。 " VLA本质上也可以算作是一种 ...
自动驾驶VLA发展到哪个阶段了?现在还适合搞研究吗?
自动驾驶之心· 2025-09-22 08:04
Core Insights - The article discusses the transition in intelligent driving technology from rule-driven to data-driven approaches, highlighting the emergence of VLA (Vision-Language Action) as a more straightforward and effective method compared to traditional end-to-end systems [1][2] - The challenges in the current VLA technology stack are emphasized, including the complexity and fragmentation of knowledge, which makes it difficult for newcomers to enter the field [2][3] - A new practical course on VLA has been developed to address these challenges, providing a structured learning path for students interested in advanced knowledge in autonomous driving [3][4][5] Summary by Sections Introduction to VLA - The article introduces VLA as a significant advancement in autonomous driving, offering a cleaner approach than traditional end-to-end systems, while also addressing corner cases more effectively [1] Challenges in Learning VLA - The article outlines the difficulties faced by learners in navigating the complex and fragmented knowledge landscape of VLA, which includes a plethora of algorithms and a lack of high-quality documentation [2] Course Development - A new course titled "Autonomous Driving VLA Practical Course" has been created to provide a comprehensive overview of the VLA technology stack, aiming to facilitate easier entry into the field for students [3][4] Course Features - The course is designed to address key pain points, offering quick entry into the subject matter through accessible language and examples [3] - It aims to build a framework for understanding VLA research and enhance research capabilities by teaching students how to categorize papers and extract innovative points [4] - The course includes practical components to ensure that theoretical knowledge is effectively applied in real-world scenarios [5] Course Outline - The course covers various topics, including the origins of VLA, foundational algorithms, and the differences between modular and integrated VLA systems [6][15][19][20] - It also includes practical coding exercises and projects to reinforce learning and application of concepts [22][24][26] Instructor Background - The course is led by experienced instructors with a strong background in multi-modal perception, autonomous driving, and large model frameworks, ensuring high-quality education [27] Learning Outcomes - Upon completion, students are expected to have a thorough understanding of current advancements in VLA, core algorithms, and the ability to apply their knowledge in practical settings [28][29]
VLA的论文占据自动驾驶前沿方向的主流了。。。
自动驾驶之心· 2025-09-19 16:03
从今年各个CV与AI顶会来看,VLA及其相关衍生方向,已经成为自动驾驶公司和高校实验室的主攻方向,占据了自驾前沿方向近一半的产出。特别是推理增强VLA、强 化学习、相关benchmark等等。 想象一下, 如果能通过语言下达指令(找到最近的星巴克),并且车辆能够丝滑的行车&泊车,是一件多么幸福的事情! VLA打破了传统方法的单任务局限,使得自动驾驶车辆能够在多样化的场景中自主决策,灵活应对未见过的环境!VLA更加直白和干净,很多方法也取消了传统端到端的 复杂的3D感知任务。借鉴VLM更强大的通用泛化能力,除了任务更简洁,VLA更重要的还是提供了一种解决corner case的可能性。 而随着学术界和工业界的目光投向端到端这个技术领域,我们发现了很多问题。自动驾驶VLA的技术栈仍然没有收敛!一系列算法如雨后春笋般冒出: 技术栈多?入门困难? 前一段时间我们推出了《端到端与VLA自动驾驶小班课》,这门课侧重在端到端自动驾驶的技术栈梳理,同学们的反馈很好。 所以很多同学联系自动驾驶之心想学习更多 关于VLA的前沿知识! 因此自动驾驶之心联合清华大学的教研团队共同打造了《自动驾驶VLA实战教程》 ,针对自动驾驶VLA ...
纯视觉最新SOTA!AdaThinkDrive:更灵活的自动驾驶VLA思维链(清华&小米)
自动驾驶之心· 2025-09-18 23:33
Core Viewpoint - The article discusses the limitations of existing Chain-of-Thought (CoT) reasoning methods in Vision-Language-Action (VLA) models for autonomous driving, particularly in simple scenarios where they do not improve decision quality and introduce unnecessary computational overhead. It introduces AdaThinkDrive, a new VLA framework that employs a dual-mode reasoning mechanism inspired by the "fast and slow thinking" theory, allowing the model to adaptively choose when to reason based on scene complexity [3][4][10]. Group 1: Introduction and Background - The shift from traditional modular approaches to end-to-end architectures in autonomous driving systems is highlighted, noting that while modular methods offer flexibility, they suffer from information loss between components, leading to cumulative errors in complex scenarios. End-to-end methods mitigate this issue but are still limited by their reliance on supervised data [7]. - The article categorizes current VLA methods into two paradigms: meta-action methods focusing on high-level guidance and planning-based methods that predict trajectories directly from raw inputs. The application of CoT techniques is becoming more prevalent, particularly in complex scenarios, but their effectiveness in simple scenarios is questioned [14][15]. Group 2: AdaThinkDrive Framework - AdaThinkDrive is proposed as an end-to-end VLA framework that incorporates a "fast answer/slow thinking" mechanism, allowing the model to switch adaptively between direct prediction and explicit reasoning based on scene complexity. This is achieved through a three-stage adaptive reasoning strategy [11][18]. - The framework's performance is validated through extensive experiments on the Navsim benchmark, achieving a Predictive Driver Model Score (PDMS) of 90.3, which is 1.7 points higher than the best pure visual baseline model. The model demonstrates superior adaptive reasoning capabilities, selectively enabling CoT in 96% of complex scenarios and defaulting to direct trajectory prediction in 84% of simple scenarios [4][18][50]. Group 3: Experimental Results and Analysis - The article presents a comprehensive evaluation of AdaThinkDrive against existing models, showing that it outperforms both "always think" and "never think" baseline models, with PDMS improvements of 2.0 and 1.4 points, respectively. Additionally, the reasoning time is reduced by 14% compared to the "always think" baseline, indicating a balance between accuracy and efficiency [4][18][58]. - The results indicate that the optimal reasoning strategy is not universal but depends on scene complexity, emphasizing the need for models to adaptively enable reasoning based on the context [10][18]. Group 4: Conclusion - The article concludes that reasoning in simple scenarios often increases computational costs without enhancing decision quality. AdaThinkDrive addresses this by allowing agents to learn when to think, guided by an adaptive thinking reward mechanism. The experimental results on the NAVSIM benchmark demonstrate that AdaThinkDrive achieves state-of-the-art performance, underscoring the importance of adaptive thinking for accurate and efficient decision-making in autonomous driving systems [66].
国内首个自动驾驶VLA实战课程来了(模块化/一体化/推理增强VLA)
自动驾驶之心· 2025-09-16 10:49
VLA绝对是今年自动驾驶学术界和工业界的主流关键词。 去年的端到端+VLM,标志着智能驾驶从规则驱动向数据驱动的根本转变。在实际中使用我们发现,端到端虽然提供了一个打通上下游视角的能力,但面对复杂的困难场 景仍然受限。如果在自动驾驶公司工作过,就知道量产模型的迭代仍然被限制在无限corner case的循环中。 VLA本质上也可以算作是一种端到端,不过更加直白和干净,很多方法也取消了传统端到端的复杂的3D感知任务。借鉴VLM更强大的通用泛化能力,除了任务更简洁, VLA更重要的还是提供了一种解决corner case的可能性。 而随着学术界和工业界的目光投向端到端这个技术领域,我们发现了很多问题。自动驾驶VLA的技术栈仍然没有收敛!一系列算法如雨后春笋般冒出: 因此我们联合国内外的教研团队共同打造了《自动驾驶VLA实战教程》,针对自动驾驶VLA的技术栈进行了全面的梳理。 学习自动驾驶VLA,是一个一站式强化多领域 知识的好机会。视觉感知、语言模块、动作模块,配套大模型的前沿技术(RAG/CoT/强化学习/MoE)等等,涉及的技术栈非常广。但这样的学习路径往往非常痛苦。同 时掌握多个领域的知识已经足够困难,而各 ...
公司通知团队缩减,懂端到端的留下来了。。。
自动驾驶之心· 2025-08-19 23:32
Core Viewpoint - The article discusses the rapid evolution and challenges in the field of end-to-end autonomous driving technology, emphasizing the need for a comprehensive understanding of various algorithms and models to succeed in this competitive industry [2][4][6]. Group 1: Industry Trends - The shift from modular approaches to end-to-end systems in autonomous driving aims to eliminate cumulative errors between modules, marking a significant technological leap [2]. - The emergence of various algorithms and models, such as UniAD and BEV perception, indicates a growing focus on integrating multiple tasks into a unified framework [4][9]. - The demand for knowledge in multi-modal large models, reinforcement learning, and diffusion models is increasing, reflecting the industry's need for versatile skill sets [5][20]. Group 2: Learning Challenges - New entrants face difficulties due to the fragmented nature of knowledge and the overwhelming volume of research papers in the field, often leading to early abandonment of learning [5][6]. - The lack of high-quality documentation and practical guidance further complicates the transition from theory to practice in end-to-end autonomous driving research [5][6]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges, focusing on practical applications and theoretical foundations [6][24]. - The course is structured to provide a comprehensive understanding of end-to-end algorithms, including their historical development and current trends [11][12]. - Practical components, such as real-world projects and assignments, are included to ensure that participants can apply their knowledge effectively [8][21]. Group 4: Course Content Overview - The course covers various topics, including the introduction to end-to-end algorithms, background knowledge on relevant technologies, and detailed explorations of both one-stage and two-stage end-to-end methods [11][12][13]. - Specific chapters focus on advanced topics like world models and diffusion models, which are crucial for understanding the latest advancements in autonomous driving [15][17][20]. - The final project involves practical applications of reinforcement learning from human feedback (RLHF), allowing participants to gain hands-on experience [21].
这几个方向,从自驾转大模型会比较丝滑......
自动驾驶之心· 2025-08-06 11:25
Core Insights - The article discusses the booming field of large models in AI, particularly focusing on various directions such as RAG (Retrieval-Augmented Generation), AI Agents, and multi-modal models [1][2]. Group 1: Large Model RAG - Large model RAG is highlighted as a significant area, with emphasis on understanding components like retrievers, augmenters, and generators, and how knowledge bases can enhance performance [1]. - The article mentions the rapid development of subfields within RAG, including Graph RAG, applications in visual understanding, and various knowledge-oriented methods [1]. Group 2: AI Agents - AI Agents are identified as a hot direction in large models, covering topics such as single-agent and multi-agent systems, reinforcement learning, and efficient communication among agents [1]. - The integration of RAG with agents is also noted as a promising area for exploration [1]. Group 3: Multi-modal Models - The article points out the extensive directions available in multi-modal models, including visual language models, pre-training datasets, and fine-tuning processes [2]. - Deployment, inference, and optimization of these models are also discussed as critical components of the development process [2]. Group 4: Community and Learning - The article encourages engagement with the "Big Model Heart Tech" community for further learning and collaboration in the field of large models [3]. - The community aims to build a significant platform for talent and academic information related to large models [3].
4000人了,死磕技术的自动驾驶黄埔军校到底做了哪些事情?
自动驾驶之心· 2025-07-31 06:19
Core Viewpoint - The article emphasizes the importance of creating an engaging learning environment in the field of autonomous driving and AI, aiming to bridge the gap between industry and academia while providing valuable resources for students and professionals [1]. Group 1: Community and Resources - The community has established a closed loop across various fields including industry, academia, job seeking, and Q&A exchanges, focusing on what type of community is needed [1][2]. - The platform offers cutting-edge academic content, industry roundtables, open-source code solutions, and timely job information, streamlining the search for resources [2][3]. - A comprehensive technical roadmap with over 40 technical routes has been organized, catering to various interests from consulting applications to the latest VLA benchmarks [2][14]. Group 2: Educational Content - The community provides a series of original live courses and video tutorials covering topics such as automatic labeling, data processing, and simulation engineering [4][10]. - Various learning paths are available for beginners, as well as advanced resources for those already engaged in research, ensuring a supportive environment for all levels [8][10]. - The community has compiled a wealth of open-source projects and datasets related to autonomous driving, facilitating quick access to essential materials [25][27]. Group 3: Job Opportunities and Networking - The platform has established a job referral mechanism with multiple autonomous driving companies, allowing members to submit their resumes directly to desired employers [4][11]. - Continuous job sharing and position updates are provided, contributing to a complete ecosystem for autonomous driving professionals [11][14]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from industry experts [75]. Group 4: Technical Focus Areas - The community covers a wide range of technical focus areas including perception, simulation, planning, and control, with detailed learning routes for each [15][29]. - Specific topics such as 3D target detection, BEV perception, and online high-precision mapping are thoroughly organized, reflecting current industry trends and research hotspots [42][48]. - The platform also addresses emerging technologies like visual language models (VLM) and diffusion models, providing insights into their applications in autonomous driving [35][40].