Workflow
自动驾驶之心
icon
Search documents
FSD奔着无人的方向去了!马斯克:FSD已经产生了自我意识...
自动驾驶之心· 2025-10-09 07:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 技术上千亿级别参数模型落地,首次实现Robotaxi和量产车真正同源同架构,大模型通用自动驾驶迈 出第一步。 此次更新内容还挺丰富的,像增加了更多驾驶风格的选项,添加停车选项。coner cases 的优化,比如特殊车辆(警车,救护车,校车),路面杂物等等。柱哥看了一些实车体验 ,不少人 都感叹现在FSD好像知道眼前正在发生什么,马斯克则直接说: Feel Alive 。 特斯拉FSD v14完整的发布说明: 七号,柱哥还在高速上堵着的时候。特斯拉更新了FSD V14.1,开始向美国用户推送,目前只有 HW4.0硬件的车辆可以更新,硬件版本低于4.0的车辆不在此次更新范围内。整体看下来,FSD奔着 无人的方向去了! 到达选项的说明: 添加了到达选项(Arrival Options),供您选择消防处应停车的位置:停车场、街道、私人车 道、停车场或路边。 增加了对紧急车辆(如警车、消防车、救护车)靠边停车或让行的处理。 将导航和路线添加到基于视觉的神经网络中,实时处理道路封锁与绕行。 添加了额外的速度配置文件,以 ...
我们正在寻找自动驾驶领域的合伙人......
自动驾驶之心· 2025-10-09 04:00
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include 4D annotation, world models, large models/multi-modal large models, diffusion models, VLA, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, closed-loop simulation 3DGS, and large model deployment and quantized perception reasoning [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes shared resources in autonomous driving (job placement, PhD recommendations, study abroad opportunities), substantial cash incentives, and collaboration opportunities on entrepreneurial projects [5] - Interested parties are encouraged to contact via WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]
学术界和工业界都在如何研究端到端与VLA?三个月搞定端到端自动驾驶!
自动驾驶之心· 2025-10-09 04:00
Core Viewpoint - The article discusses the evolution and current state of end-to-end algorithms in autonomous driving, highlighting the emergence of various subfields, particularly those based on Visual Language Models (VLA) and the increasing interest in these technologies within both academia and industry [1][3]. Summary by Sections End-to-End Algorithms - End-to-end algorithms are central to the current mass production of autonomous driving technologies, involving a rich technology stack. There are primarily two paradigms: single-stage and two-stage. The single-stage approach, exemplified by UniAD, directly models vehicle trajectories from sensor inputs, while the two-stage approach outputs trajectories based on perception results [1]. VLA and Related Technologies - The development has progressed from modular production algorithms to end-to-end systems and now to VLA. Key technologies involved include BEV perception, Visual Language Models (VLM), diffusion models, reinforcement learning, and world models. The article emphasizes the importance of understanding these technologies to grasp the cutting-edge directions in both academia and industry [3]. Courses Offered - The article promotes two courses aimed at helping individuals quickly and efficiently learn about end-to-end and VLA in autonomous driving. The courses are designed for those new to large models and VLA, covering foundational theories and practical applications [3][10]. Course Content - The "VLA and Large Model Practical Course" focuses on VLA, starting from VLM as an interpreter for autonomous driving, and covers modular and integrated VLA, as well as mainstream inference-enhanced VLA. It includes detailed theoretical foundations and practical assignments to build VLA models and datasets from scratch [3][10]. Instructor Team - The courses are led by experienced instructors from both academia and industry, with backgrounds in multi-modal perception, autonomous driving VLA, and large model frameworks. They have published numerous papers in top conferences and have substantial practical experience in the field [7][9][10]. Target Audience - The courses are aimed at individuals with a foundational understanding of autonomous driving, familiar with basic modules, and possessing knowledge of transformer models, reinforcement learning, and BEV perception. A background in probability theory, linear algebra, and programming in Python and PyTorch is also recommended [13].
自动驾驶Ask Me Anything问答整理!VLA和WA的路线之争?
自动驾驶之心· 2025-10-08 23:33
Core Insights - The article discusses the current state and future prospects of autonomous driving technology, emphasizing the importance of AI and various modeling approaches in achieving higher levels of automation [4][6][9]. Group 1: Industry Development - The autonomous driving industry is rapidly evolving, with significant advancements expected in the next few years, particularly in AI and related fields [4]. - Companies like Waymo and Tesla are leading the way in achieving Level 4 (L4) automation, while Level 5 (L5) may take at least five more years to realize [4][6]. - The integration of Vision-Language Models (VLA) is seen as a key to enhancing decision-making capabilities in autonomous vehicles, addressing long-tail problems that pure end-to-end models may struggle with [6][9]. Group 2: Technical Approaches - The article outlines different modeling approaches in autonomous driving, including end-to-end models and the emerging VLA paradigm, which combines language processing with visual data to improve reasoning and decision-making [5][9]. - The effectiveness of current autonomous driving systems is still limited, with many challenges remaining in achieving full compliance with traffic regulations and safety standards [10][14]. - The discussion highlights the importance of data and cloud computing capabilities in narrowing the performance gap between domestic companies and leaders like Tesla [14][15]. Group 3: Talent and Education - There is a recognized talent gap in the autonomous driving sector, with a strong recommendation for students to pursue AI and computer science to prepare for future opportunities in the industry [4][6]. - The article suggests that practical experience in larger autonomous driving companies may provide better training and growth opportunities compared to smaller robotics firms [16][20].
YOLO26不是第26代,而是“破局者”!颠覆性端到端架构重塑实时检测
自动驾驶之心· 2025-10-08 23:33
以下文章来源于集智书童 ,作者小书童 集智书童 . 书童带你领略视觉前沿之美,精选科研前沿、工业实用的知识供你我进步与学习! 作者 | 小书童 来源 | 集智书童 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 YOLO26 模型仍在开发中,尚未发布。此处显示的性能数据为 预览 。最终的下载和正式发布将很快跟进 — 请通过 YOLO Vision 2025 获取最新信 息。 概述 Ultralytics 的 YOLO26 是 YOLO 系列实时目标检测器的最新演进,专为 边缘和低功耗设备 从头设计。它引入了一套简化的设计,去除不必要的复杂性, 同时集成了有针对性的创新,以实现更快、更轻量、且更易于部署的模型。 YOLO26 的架构由三个核心原则驱动: • 简洁性: YOLO26 是一个 原生端到端模型 ,直接输出预测结果,无需非极大值抑制(NMS)。通过消除这个后处理步骤,推理更快、更轻、更易于在实际系统中部 署。这一端到端方法最早由清华大学的 Ao Wang 在 YOLOv10 中率先提出,并在 YOLO26 中得到进一步推进。 这些创新共同提供了一个在 ...
自动驾驶之心双节活动即将截止(课程/星球/硬件优惠)
自动驾驶之心· 2025-10-08 23:33
Core Insights - The article emphasizes the importance of continuous learning and engagement in the field of autonomous driving technology, highlighting various educational resources and community interactions available for professionals and enthusiasts in the industry. Group 1: Educational Offerings - The platform offers a significant discount on courses, with an 80% off coupon and a 70% discount card available for users [3] - New users can benefit from a 30% discount on renewals and a 50% discount for specific offerings [4] - A comprehensive overview of core content related to autonomous driving is provided, including 40+ learning paths covering advanced topics [5] Group 2: Community Engagement - The platform facilitates direct interactions with industry leaders and academic experts, allowing for face-to-face discussions on cutting-edge topics in autonomous driving [6] - Key discussions include the competition between VLA and WA, future directions of autonomous driving, and the intricacies of world models [6] - The community also features high-level courses on various technical subjects such as trajectory prediction, camera calibration, and 3D point cloud detection [6]
模仿学习无法真正端到端?
自动驾驶之心· 2025-10-08 23:33
Core Viewpoint - The article emphasizes that in the autonomous driving industry, the training methods are more critical than model architectures like VLA or world models, highlighting the limitations of imitation learning in achieving true end-to-end autonomous driving [2][14]. Limitations of Imitation Learning - Imitation learning assumes that expert data is optimal, but in the context of driving, there is no single perfect driving behavior due to the diverse styles and strategies of human drivers [3][4]. - The training data lacks consistency and optimality, leading to models that learn vague and imprecise driving patterns rather than clear and logical strategies [3][4]. - Imitation learning fails to distinguish between critical decision-making scenarios and ordinary ones, resulting in models that may make fatal errors in crucial moments [5][6]. Key Scene Identification - The article discusses the importance of identifying key scenes in driving, where the model's output precision is critical, especially in complex scenarios [7][8]. - It introduces the concept of "advantage" from reinforcement learning, which helps define key states where optimal actions significantly outperform others [7]. Out-of-Distribution (OOD) Issues - Open-loop imitation learning can lead to cumulative errors, causing the model to enter states that differ from the training data distribution, resulting in performance degradation [8][10][12]. - The article illustrates that models trained purely on imitation learning may struggle in critical situations, such as timely lane changes, due to their reliance on suboptimal behaviors learned from human data [13]. Conclusion - The core of technological development lies in identifying key routes and bottlenecks rather than merely following trends, suggesting a need for new methods beyond imitation learning to address its limitations [14].
Less is More!Max-V1:面向自动驾驶精巧而强大的视觉-语言模型(复旦&中科院)
自动驾驶之心· 2025-10-08 09:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Sheng Yang等 编辑 | 自动驾驶之心 最近大模型领域开始重新思考对scaling laws的传统认知,前有上交团队针对Agent任务提出的『LIMI: Less is More for Agency』。即数据越多,AI能力未必 越强越强。如今这一思考延伸到自动驾驶领域。自动驾驶VLA/VLM真的需要海量数据吗?或者说应该刨去冗余,提炼真正关键的信息。 自动驾驶之心今天要分享的工作是 复旦和中科院的团队 提出的 Max-V1 —— 全新的一阶段端到端自动驾驶框架。Max-V1将自动驾驶 重新概念化为一种广 义的语言任务 ,并将轨迹规划问题形式化为"下一个waypoint预测"(next waypoint prediction)。 背景回顾与主要贡献 人类驾驶本质上是一个 序列化决策过程 ,其中每一个动作都依赖于对周围场景的实时理解。这种感知与动作之间的动态交互,与自然语言生成具有高度相 似性——后者同样涉及生成高度相关的输出序列。从这一 ...
清华教研团队!两个月从零搭建一套自己的自动驾驶VLA模型
自动驾驶之心· 2025-10-08 09:04
Core Insights - The focus of academia and industry is shifting towards VLA (Vision-Language-Action) for enhancing autonomous driving capabilities, providing human-like reasoning in vehicle decision-making processes [1][4] - The development of autonomous driving VLA is crucial for companies, with a strong emphasis on self-research and innovation in this area [4] Summary by Sections Introduction to Autonomous Driving VLA - VLA is categorized into modular VLA, integrated VLA, and reasoning-enhanced VLA, each contributing to more reliable and safer autonomous driving [1] Course Overview - A comprehensive learning roadmap for autonomous driving VLA has been designed, covering principles to practical applications [4] Core Content of Autonomous Driving VLA - Key topics include visual perception, large language models, action modeling, model deployment, and dataset creation, with advanced algorithms like CoT, MoE, RAG, and reinforcement learning [6] Course Collaboration - The course is developed in collaboration with Tsinghua University's research team, featuring detailed explanations of cutting-edge algorithms and practical assignments [6] Course Structure - The course consists of six chapters, each focusing on different aspects of VLA, from algorithm introduction to practical applications and project work [11][19] Chapter Highlights - Chapter 1 provides an overview of VLA algorithms and their historical development, along with benchmarks and evaluation metrics [12] - Chapter 2 delves into the foundational algorithms of VLA, including Vision, Language, and Action modules, and discusses the deployment of large models [13] - Chapter 3 focuses on VLM as an interpreter in autonomous driving, analyzing classic and recent algorithms [14] - Chapter 4 explores modular and integrated VLA, emphasizing the evolution of language models in planning and control [15] - Chapter 5 discusses reasoning-enhanced VLA, introducing new modules for decision-making and action output [16] - Chapter 6 involves a major project where participants will build and fine-tune their own models [19] Learning Outcomes - The course aims to advance understanding of VLA in both academic and industrial contexts, equipping participants with the skills to apply VLA concepts in real-world projects [21] Course Schedule - The course is set to begin on October 20, with a structured timeline for each chapter's release [22] Prerequisites - Participants are expected to have a foundational knowledge of autonomous driving, large models, and relevant programming skills [23]
NeurIPS'25!AutoPrune:即插即用的自适应大模型剪枝框架
自动驾驶之心· 2025-10-07 07:46
Core Insights - The article discusses the introduction of AutoPrune, a training-free complexity-adaptive pruning framework designed to alleviate computational burdens in Visual Language Models (VLMs) by quantifying task complexity through mutual information between visual and textual tokens [3][18]. Background Review - Visual language models are central to multimodal systems, supporting tasks like image description and visual question answering (VQA). The coupling of perception and control in frameworks like VLA for autonomous driving leads to significant memory and latency bottlenecks due to high-resolution images being converted into numerous visual tokens [4][18]. - Previous methods typically employed fixed layer-wise pruning strategies, which lack global budget constraints and require manual tuning, making them less adaptable to varying task complexities [4][11]. Key Contributions - AutoPrune models visual token pruning as a constrained optimization problem under a global computational budget, optimizing three strategies: layer-wise token allocation, token selection, and token recovery [9][10]. - The complexity metric is derived from cross-modal attention, directly calculating mutual information to characterize sample difficulty and task complexity [10][13]. - The framework is designed to be plug-and-play, requiring no training and demonstrating superior performance across various datasets and pruning ratios compared to existing training-free methods [10][11]. Experimental Results - In experiments with LLaVA-1.5-7B, retaining 64 tokens maintained 96.7% of original accuracy while reducing FLOPs to 23.2%, indicating minimal loss under moderate pruning [14]. - LLaVA-NeXT-7B outperformed comparative methods across different token retention budgets, retaining 94.9% performance at a budget of 160 tokens [15]. - The results show that AutoPrune effectively supports real-time multimodal inference and embodied intelligence, revealing nuanced differences in attention distribution that align with cognitive neuroscience observations [18].