自动驾驶之心

Search documents
自动驾驶论文速递 | GS-Occ3D、BEV-LLM、协同感知、强化学习等~
自动驾驶之心· 2025-07-30 03:01
Group 1 - The article discusses recent advancements in autonomous driving technologies, highlighting several innovative frameworks and models [3][9][21][33][45] - GS-Occ3D achieves state-of-the-art (SOTA) geometric accuracy with a 0.56 corner distance (CD) on the Waymo dataset, demonstrating superior performance over LiDAR-based methods [3][5] - BEV-LLM introduces a lightweight multimodal scene description model that outperforms existing models by 5% in BLEU-4 score, showcasing the integration of LiDAR and multi-view images [9][10] - CoopTrack presents an end-to-end cooperative perception framework that sets new SOTA performance on the V2X-Seq dataset with 39.0% mAP and 32.8% AMOTA [21][22] - The Diffusion-FS model achieves a 0.7767 IoU in free-space prediction, marking a significant improvement in multimodal driving channel prediction [45][48] Group 2 - GS-Occ3D's contributions include a scalable visual occupancy label generation pipeline that eliminates reliance on LiDAR annotations, enhancing the training efficiency for downstream models [5][6] - BEV-LLM utilizes BEVFusion to combine 360-degree panoramic images with LiDAR point clouds, improving the accuracy of scene descriptions [10][12] - CoopTrack's innovative instance-level end-to-end framework integrates cooperative tracking and perception, enhancing the learning capabilities across agents [22][26] - The ContourDiff model introduces a novel self-supervised method for generating free-space samples, reducing dependency on dense annotated data [48][49]
理想发布会三小时,最狠的是:VLA 要上路了?!
自动驾驶之心· 2025-07-30 03:01
Core Viewpoint - The article discusses the launch of the Li Auto i8, highlighting its significant upgrades in assisted driving features and the introduction of the VLA (Vision-Language-Action) model, marking a milestone in the development of end-to-end autonomous driving technology [2][4]. Summary by Sections VLA Model Capabilities - The VLA model enhances three main capabilities: better semantic understanding (multimodal input), improved reasoning (thinking chains), and closer alignment with human driving intuition. It focuses on four core abilities: spatial understanding, reasoning ability, communication and memory ability, and behavioral ability [4][5]. Industry Outlook - The demand for VLA/VLM model algorithm experts is projected to be high, with salaries ranging from 40K to 70K for those with 3-5 years of experience and a master's degree. Top technical talents, especially PhD graduates, can expect salaries between 90K to 120K [13]. Learning Challenges - The article outlines the challenges faced by newcomers in the field of end-to-end autonomous driving, including the complexity of the technology stack and the fragmented nature of available knowledge. It emphasizes the need for a structured learning path to navigate the vast amount of literature and practical applications [16][17]. Course Introduction - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges. The course aims to provide a quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [17][18][19]. Course Structure - The course consists of several chapters covering the history and development of end-to-end autonomous driving, background knowledge on relevant technologies, and detailed discussions on various paradigms such as one-stage and two-stage end-to-end methods. It also includes practical assignments to reinforce learning [23][24][25][26]. Expected Outcomes - Upon completion of the course, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, with a solid understanding of key technologies and the ability to apply learned concepts to real-world projects [33].
ICCV'25 Highlight|浙江大学MaGS:统一动态重建与物理仿真三维表示!
自动驾驶之心· 2025-07-29 23:32
点击下方 卡片 ,关注" 3D视觉之心 "公众号 第一时间获取 3D视觉干货 图 0:项目主页 如何仅通过单目视频实现对动态三维物体的高质量重建与物理仿真,一直是计算机视觉与图形学领域一个极具挑 战性的问题。近期,来自浙江大学等机构的研究者们提出了名为 MaGS (Mesh-adsorbed Gaussian Splatting) 的全 新统一框架,为解决这一难题提供了新的思路。MaGS 框架的核心是创建一种创新的"网格吸附高斯(Mesh- adsorbed Gaussian)"混合表示,它巧妙地结合了三维高斯泼溅(3DGS)的渲染灵活性与三角网格(Mesh)的结 构化特性。通过这种方式,MaGS 在动态场景重建和动态场景仿真两个任务上均取得了当前最优的性能。 图 1:MaGS 示意图 该研究目前已被 ICCV 2025 接收为Highlight Paper,Arxiv地址:2406.01593。 Project Page: https://wcwac.github.io/MaGS-page/ 在计算机图形学和计算机视觉领域,从视频中重建三维世界(Reconstruction)并对其进行物理交互和动画模拟 ( ...
干货 | 基于深度强化学习的轨迹规划(附代码解读)
自动驾驶之心· 2025-07-29 23:32
Core Viewpoint - The article discusses the advancements and applications of reinforcement learning (RL) in the field of autonomous driving, highlighting its potential to enhance decision-making processes in dynamic environments. Group 1: Background and Concepts - The concept of VLA (Variational Learning Algorithm) and its relation to embodied intelligence is introduced, emphasizing its similarity to end-to-end autonomous driving [3] - Reinforcement learning has gained traction in various industries following significant milestones like AlphaZero in 2018 and ChatGPT in 2023, showcasing its broader applicability [3] - The article aims to explain reinforcement learning from a computer vision perspective, drawing parallels with established concepts in the field [3] Group 2: Learning Methods - Supervised learning in autonomous driving involves tasks like object detection, where a model is trained to map inputs to outputs using labeled data [5] - Imitation learning is described as a method where models learn actions by mimicking human behavior, akin to how children learn from adults [6] - Reinforcement learning differs from imitation learning by focusing on optimizing actions based on feedback from interactions with the environment, making it suitable for sequential decision-making tasks [7] Group 3: Advanced Learning Techniques - Inverse reinforcement learning is introduced as a method to derive reward functions from expert data, particularly useful when defining rewards is challenging [8] - The Markov Decision Process (MDP) is explained as a framework for modeling decision-making tasks, where states, actions, and rewards are interrelated [9] - Dynamic programming and Monte Carlo methods are discussed as techniques for solving reinforcement learning problems, emphasizing their role in optimizing decision-making processes [11][12] Group 4: Reinforcement Learning Algorithms - Various reinforcement learning algorithms are categorized, including on-policy and off-policy methods, highlighting their differences in training stability and data utilization [25][26] - The article outlines key algorithms such as Q-learning, SARSA, and policy gradient methods, explaining their mechanisms and applications in reinforcement learning [27][29] - Advanced algorithms like TRPO and PPO are presented, focusing on their strategies for ensuring stable training and optimizing policy updates [57][58] Group 5: Applications in Autonomous Driving - The importance of reward design in autonomous driving is emphasized, with safety, comfort, and efficiency being key factors [62] - The article discusses the need for closed-loop training systems in autonomous driving, where vehicle actions influence the environment, necessitating dynamic modeling of other vehicles [62] - The integration of end-to-end learning with reinforcement learning is highlighted as a method to adapt to changing environments in real-time [63]
自动驾驶Agent来了!DriveAgent-R1:智能思维和主动感知Agent(上海期智&理想)
自动驾驶之心· 2025-07-29 23:32
Core Viewpoint - DriveAgent-R1 represents a significant advancement in autonomous driving technology, addressing long-term, high-level decision-making challenges through a hybrid thinking framework and active perception mechanism [2][31]. Group 1: Innovations and Challenges - DriveAgent-R1 introduces two core innovations: a novel three-stage progressive reinforcement learning strategy and the MP-GRPO (Mode Grouped Reinforcement Policy Optimization) to enhance the agent's dual-mode specificity capabilities [3][12]. - The current potential of Visual Language Models (VLM) in autonomous driving is limited by short-sighted decision-making and passive perception, particularly in complex environments [2][4]. Group 2: Hybrid Thinking and Active Perception - The hybrid thinking framework allows the agent to adaptively switch between efficient text-based reasoning and in-depth tool-assisted reasoning based on scene complexity [5][12]. - The active perception mechanism equips the agent with a powerful visual toolbox to actively explore the environment, improving decision-making transparency and reliability [5][12]. Group 3: Training Strategy and Performance - A complete three-stage progressive training strategy is designed, focusing on dual-mode supervised fine-tuning, forced comparative mode reinforcement learning, and adaptive mode selection reinforcement learning [24][29]. - DriveAgent-R1 achieves state-of-the-art (SOTA) performance on challenging datasets, surpassing leading multimodal models like Claude Sonnet 4 and Gemini 2.5 Flash [12][26]. Group 4: Experimental Results - Experimental results show that DriveAgent-R1 significantly outperforms baseline models, with first frame accuracy increasing by 14.2% and sequence average accuracy by 15.9% when using visual tools [26][27]. - The introduction of visual tools enhances the decision-making capabilities of state-of-the-art VLMs, demonstrating the potential of actively acquiring visual information in driving intelligence [27]. Group 5: Active Perception and Visual Dependency - Active perception is crucial for deep visual reliance, as evidenced by the drastic performance drop of DriveAgent-R1 when visual inputs are removed, confirming its decisions are genuinely driven by visual data [30][31]. - The training strategy effectively transforms potential distractions from tools into performance amplifiers, showcasing the importance of structured training in utilizing visual tools [27][29].
课程+软件+硬件!你的第一款小车,自动驾驶全栈技术平台黑武士001
自动驾驶之心· 2025-07-29 23:32
Core Viewpoint - The article announces the launch of the "Black Warrior Series 001," a lightweight autonomous driving solution aimed at research and education, now available for pre-sale at a discounted price of 36,999 yuan, including three free courses [1]. Group 1: Product Overview - The Black Warrior 001 is developed by the Autonomous Driving Heart team, supporting various functionalities such as perception, positioning, fusion, navigation, and planning, built on an Ackermann chassis [2]. - The product allows for secondary development and modification, with multiple installation positions and interfaces for adding sensors like cameras and millimeter-wave radars [3]. Group 2: Target Audience and Applications - The product is suitable for undergraduate learning progression, graduate research and thesis writing, job-seeking projects for graduate students, laboratory teaching tools in universities, and training companies or vocational schools [5]. Group 3: Performance and Testing - The product has been tested in various environments, including indoor, outdoor, and basement scenarios, demonstrating its capabilities in perception, positioning, fusion, navigation, and planning [3]. Group 4: Hardware Specifications - Key sensors include a Mid 360 3D LiDAR, a 2D LiDAR from Lidar, a depth camera from Orbbec, and an Nvidia Orin NX 16G main control chip [20]. - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a runtime of over 4 hours, with a maximum speed of 2 m/s [22]. Group 5: Software and Functionality - The software framework includes ROS, C++, and Python, supporting one-click startup and providing a development environment [24]. - Various functionalities include 2D and 3D SLAM, point cloud processing, vehicle navigation, and obstacle avoidance [25]. Group 6: After-Sales and Support - The company offers one year of after-sales support (excluding human damage), with free repairs for damages caused by operational errors or code modifications during the warranty period [47].
更新了一些自驾求职的视频和面经......
自动驾驶之心· 2025-07-29 07:53
做了一些求职的视频教程,放到求职星球里面了...... 从行业、岗位和工作内容的角度为大家剖析,应该怎么选,什么样子的最适合自己。 更多内容欢迎加入我们的求职星球了解,一个转为自动驾驶、机器人和大模型求职打造的社区。 AutoRobo知识星球 这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方,目前近1000名成员了,成员范围包含已经 工作的社招同学,如智元机器人、宇树科技、地瓜机器人、地平线、理想汽车、华为、小米汽车、momenta、元 戎启行等公司。同时也包含2024年秋招、2025年秋招的小伙伴,方向涉及自动驾驶与具身智能绝大领域。 星球内部有哪些内容?这一点结合我们已有的优势,给大家汇总了面试题目、面经、行业研报、谈薪技巧、还有 各类内推公司、简历优化建议服务。 招聘信息 星球内部日常为大家分享已有的算法、开发、产品等岗位,基本都是公司第一时间分享给我们的!涉及校招、社 招、实习等岗位。 面试一百问 刚上传了几个嘉宾录制的求职类视频课程,主要关于小厂、大厂面试,秋招的校招如何准备、公司选择等主要问 题,以及大模型、自动标注、端到端一些岗位的介绍和分析。 AutoRobo内部为大家汇总了自动驾驶 ...
自动驾驶之心技术交流群来啦!
自动驾驶之心· 2025-07-29 07:53
Core Viewpoint - The article emphasizes the establishment of a leading communication platform for autonomous driving technology in China, focusing on industry, academic, and career development aspects [1]. Group 1 - The platform, named "Autonomous Driving Heart," aims to facilitate discussions and exchanges among professionals in various fields related to autonomous driving technology [1]. - The technical discussion group covers a wide range of topics including large models, end-to-end systems, VLA, BEV perception, multi-modal perception, occupancy, online mapping, 3DGS, multi-sensor fusion, transformers, point cloud processing, SLAM, depth estimation, trajectory prediction, high-precision maps, NeRF, planning control, model deployment, autonomous driving simulation testing, product management, hardware configuration, and AI job exchange [1]. - Interested individuals are encouraged to join the community by adding a WeChat assistant and providing their company/school, nickname, and research direction [1].
TUM最新!全面梳理自动驾驶基础模型:LLM/VLM/MLLM/扩散模型和世界模型一网打尽~
自动驾驶之心· 2025-07-29 00:52
Core Insights - The article presents a comprehensive review of the latest advancements in autonomous driving, focusing on the application of foundation models (FMs) such as LLMs, VLMs, MLLMs, diffusion models, and world models in scene generation and analysis [2][20][29] - It emphasizes the importance of simulating diverse and rare driving scenarios for the safety and performance validation of autonomous driving systems, highlighting the limitations of traditional scene generation methods [2][8][9] - The review identifies open research challenges and future directions for enhancing the adaptability, robustness, and evaluation capabilities of foundation model-driven approaches in autonomous driving [29][30] Group 1: Foundation Models in Autonomous Driving - Foundation models represent a new generation of pre-trained AI models capable of processing heterogeneous inputs, enabling the synthesis and interpretation of complex driving scenarios [2][9][10] - The emergence of foundation models has provided new opportunities to enhance the realism, diversity, and scalability of scene testing in autonomous driving [9][10] - The review categorizes the applications of LLMs, VLMs, MLLMs, diffusion models, and world models in scene generation and analysis, providing a structured classification system [29] Group 2: Scene Generation and Analysis - Scene generation in autonomous driving encompasses various formats, including annotated sensor data, multi-camera video streams, and simulated urban environments [21] - The article discusses the limitations of existing literature on scene generation, noting that many reviews focus on classical methods without adequately addressing the role of foundation models [23][24][25] - Scene analysis involves systematic evaluation tasks such as risk assessment and anomaly detection, which are crucial for ensuring the safety and robustness of autonomous systems [25][28] Group 3: Research Contributions and Future Directions - The review provides a structured classification of existing methods, datasets, simulation platforms, and benchmark competitions related to scene generation and analysis in autonomous driving [29] - It identifies key open research challenges, including the need for better integration of foundation models in scene generation and analysis tasks, and proposes future research directions to address these challenges [29][30] - The article highlights the necessity for efficient prompting techniques and lightweight model architectures to reduce inference latency and resource consumption in real-world applications [36][37]
基于Qwen2.5-VL实现自动驾驶VLM的SFT
自动驾驶之心· 2025-07-29 00:52
Core Insights - The article discusses the implementation of the LLaMA Factory framework for fine-tuning large models in the context of autonomous driving, utilizing a small dataset of 400 images and a GPU 3090 with 24GB memory [1][2]. Group 1: LLaMA Factory Overview - LLaMA Factory is an open-source low-code framework for fine-tuning large models, gaining popularity in the open-source community with over 40,000 stars on GitHub [1]. - The framework integrates widely used fine-tuning techniques and is designed to facilitate the training of models suitable for visual-language tasks in autonomous driving scenarios [1]. Group 2: Qwen2.5-VL Model - The Qwen2.5-VL model serves as the foundational model for the project, achieving significant breakthroughs in visual recognition, object localization, document parsing, and long video understanding [2]. - It offers three model sizes, with the flagship Qwen2.5-VL-72B performing comparably to advanced models like GPT-4o and Claude 3.5 Sonnet, while smaller versions excel in resource-constrained environments [2]. Group 3: CoVLA Dataset - The CoVLA dataset, comprising 10,000 real driving scenes and over 80 hours of video, is utilized for training and evaluating visual-language-action models [3]. - This dataset surpasses existing datasets in scale and annotation richness, providing a comprehensive platform for developing safer and more reliable autonomous driving systems [3]. Group 4: Model and Dataset Installation - Instructions for downloading and installing LLaMA Factory and the Qwen2.5-VL model are provided, including commands for cloning the repository and installing necessary dependencies [4][5][6]. - The CoVLA dataset can also be downloaded from Hugging Face, with configurations to speed up the download process [8][9]. Group 5: Fine-tuning Process - The fine-tuning process involves using the SwanLab tool for visual tracking of the training, with commands provided for installation and setup [14]. - After configuring parameters and starting the fine-tuning task, logs of the training process are displayed, and the fine-tuned model is saved for future use [17][20]. Group 6: Model Testing and Evaluation - Post-training, the fine-tuned model is tested through a web UI, allowing users to input questions related to autonomous driving risks and receive more relevant answers compared to the original model [22]. - The original model, while informative, may provide less relevant responses, highlighting the benefits of fine-tuning for specific applications [22].