Autonomous Driving

Search documents
小马智行开启7×24小时自动驾驶测试:「不眠模式」破解城市夜归难题
IPO早知道· 2025-07-25 13:15
Core Viewpoint - The article highlights the launch of 24/7 autonomous driving testing by Xiaoma Zhixing in Beijing, Guangzhou, and Shenzhen, marking a significant innovation in autonomous driving policies in these cities [2][4]. Group 1: Autonomous Driving Testing - Xiaoma Zhixing has expanded its testing hours from 7 AM to 11 PM to a full 24 hours, addressing the "forgotten hours" of late-night transportation where traditional services are limited [2]. - The company has accumulated over 50 million kilometers of autonomous driving testing mileage across various cities, demonstrating its extensive experience and capability in diverse conditions [4]. Group 2: Technology and Safety - The L4 autonomous driving system utilizes a multi-sensor fusion technology, including high-performance 128-line LiDAR and 8-megapixel cameras, ensuring real-time environmental recognition even in challenging low-light conditions [4][5]. - Xiaoma Zhixing's self-developed sensor cleaning solution effectively addresses perception accuracy issues caused by extreme weather conditions, enhancing driving safety [5]. Group 3: Urban Impact and Future Potential - The deployment of Xiaoma Zhixing's seventh-generation autonomous vehicles is reshaping urban operational logic, aiming to unlock the commercial potential and social value of autonomous driving services [7]. - The company envisions its 24/7 autonomous vehicles as "guardians" of the city, providing reliable transportation options for individuals during late-night hours [7].
传统的感知被嫌弃,VLA逐渐成为新秀......
自动驾驶之心· 2025-07-25 08:17
端到端自动驾驶作为目前智驾量产的核心算法,可以分为一段式端到端、二段式端到端两个大的技术方 向。这两年有非常多的工作如雨后春笋般涌现,以PLUTO为代表的二段式端到端思考如何用模型实现自车 规划;以UniAD为代表的基于感知的一段式端到端不断发展进步;以OccWorld为代表的基于世界模型的一 段式端到端开创了新流派;以DiffusionDrive为代表的基于扩散模型的一段式端到端开创了多模轨迹的新时 代;随后基于VLM的一系列方法不断进化出自动驾驶VLA方向,开启了大模型时代下的端到端; 而传统的BEV感知、车道线、Occupancy等工作相对较少出现在顶会了,最近也有很多同学陆续来咨询峰 哥,传统的感知、规划这块还能继续发论文吗?感觉工作都已经被做的七七八八了,审稿人会打高分吗? 说到传统的感知、规划等任务,工业界都还在继续优化方案!但学术界基本都慢慢转向大模型与VLA了, 这个领域还有很多工作可以做的子领域...... 但新的领域往往对初学者是陌生的,只有极少数科研能力强的人才有机会独立产出。如果您真的需要选择 论文研究方向,我们建议向大模型、VLA靠拢。 如果您基础真的不好,也可以看看我们为大家准备好 ...
基于3DGS和Diffusion的自动驾驶闭环仿真论文总结
自动驾驶之心· 2025-07-24 09:42
Core Viewpoint - The article discusses advancements in autonomous driving simulation technology, highlighting the integration of various components such as scene rendering, data collection, and intelligent agents to create realistic driving environments [1][2][3]. Group 1: Simulation Components - The first step involves creating a static environment using 3D Gaussian Splatting and Diffusion Models to build a realistic cityscape, capturing intricate details [1]. - The second step focuses on data collection from panoramic views to extract dynamic assets like vehicles and pedestrians, enhancing the realism of simulations [2]. - The third step emphasizes relighting techniques to ensure that assets appear natural under various lighting conditions, simulating different times of day and weather scenarios [2]. Group 2: Intelligent Agents and Weather Systems - The fourth step introduces intelligent agents that mimic real-world behaviors, allowing for complex interactions within the simulation [3]. - The fifth step incorporates weather systems to enhance the atmospheric realism of the simulation, enabling scenarios like rain or fog [4]. Group 3: Advanced Features - The sixth step includes advanced features that challenge autonomous vehicles with unexpected obstacles, simulating real-world driving complexities [4].
端到端自动驾驶万字长文总结
自动驾驶之心· 2025-07-23 09:56
Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].
8万条!清华开源VLA数据集:面向自动驾驶极端场景,安全提升35%
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article discusses the development of the Impromptu VLA dataset, which aims to address the data scarcity issue in unstructured driving environments for autonomous driving systems. It highlights the dataset's potential to enhance the performance of vision-language-action models in complex scenarios [4][29]. Dataset Overview - The Impromptu VLA dataset consists of over 80,000 meticulously constructed video clips, extracted from more than 2 million original materials across eight diverse open-source datasets [5][29]. - The dataset focuses on four key unstructured challenges: boundary-ambiguous roads, temporary traffic rule changes, unconventional dynamic obstacles, and complex road conditions [12][13]. Methodology - The dataset construction involved a multi-step process, including data collection, scene classification, and multi-task annotation generation, utilizing advanced visual-language models (VLMs) for scene understanding [10][17]. - A rigorous manual verification process was implemented to ensure high-quality annotations, with significant F1 scores achieved for various categories, confirming the reliability of the VLM-based annotation process [18]. Experimental Validation - The effectiveness of the Impromptu VLA dataset was validated through comprehensive experiments, showing significant performance improvements in mainstream autonomous driving benchmarks. For instance, the average score in the closed-loop NeuroNCAP test improved from 1.77 to 2.15, with a reduction in collision rates from 72.5% to 65.5% [6][21]. - In open-loop trajectory prediction evaluations, models trained with the Impromptu VLA dataset achieved L2 errors as low as 0.30 meters, demonstrating competitive performance compared to leading methods that rely on larger proprietary datasets [24]. Conclusion - The Impromptu VLA dataset serves as a critical resource for developing more robust and adaptive autonomous driving systems capable of handling complex real-world scenarios. The research confirms the dataset's significant value in enhancing perception, prediction, and planning capabilities in unstructured driving environments [29].
聊聊自动驾驶闭环仿真和3DGS!
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm, which aims to efficiently model dynamic street scenes for autonomous driving simulations, addressing previous limitations in training and rendering speeds [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - The Street Gaussians algorithm represents dynamic urban street scenes as a combination of point-based backgrounds and foreground objects, utilizing optimized vehicle tracking poses [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to depict geometric shape and color, with parameters including covariance matrices and position vectors [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. Group 3: Innovations in Appearance Modeling - The article introduces a 4D spherical harmonic model to encode temporal information into the appearance of moving vehicles, reducing storage costs compared to traditional methods [12]. - The effectiveness of the 4D spherical harmonic model is demonstrated, showing significant improvements in rendering results and reducing artifacts [16]. Group 4: Initialization Techniques - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. Group 5: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][30].
行车报漏检了,锅丢给了自动标注。。。
自动驾驶之心· 2025-07-22 07:28
Core Viewpoint - The article discusses the challenges and methodologies in automating the labeling of training data for occupancy networks (OCC) in autonomous driving, emphasizing the need for high-quality data to improve model generalization and safety [2][10]. Group 1: OCC and Its Importance - The occupancy network aims to partition space into small grids to predict occupancy, addressing irregular obstacles like fallen trees and other background elements [3][4]. - Since Tesla's announcement of OCC in 2022, it has become a standard in pure vision autonomous driving solutions, leading to a high demand for training data labeling [2][4]. Group 2: Challenges in Automated Labeling - The main challenges in 4D automated labeling include: 1. High temporal and spatial consistency requirements for tracking dynamic objects across frames [9]. 2. Complexity in fusing multi-modal data from various sensors [9]. 3. Difficulty in generalizing to dynamic scenes due to unpredictable behaviors of traffic participants [9]. 4. The contradiction between labeling efficiency and cost, as high precision requires manual verification [9]. 5. High requirements for generalization in production scenarios, necessitating data extraction from diverse environments [9]. Group 3: Training Data Generation Process - The common process for generating OCC training ground truth involves: 1. Ensuring consistency between 2D and 3D object detection [8]. 2. Comparing with edge models [8]. 3. Involving manual labeling for quality control [8]. Group 4: Course Offerings - The article promotes a course on 4D automated labeling, covering the entire process and core algorithms, aimed at learners interested in the autonomous driving data loop [10][26]. - The course includes practical exercises and addresses real-world challenges in the field, enhancing algorithmic capabilities [10][26]. Group 5: Course Structure - The course is structured into several chapters, including: 1. Basics of 4D automated labeling [11]. 2. Dynamic obstacle labeling [13]. 3. Laser and visual SLAM reconstruction [14]. 4. Static element labeling based on reconstruction [16]. 5. General obstacle OCC labeling [18]. 6. End-to-end ground truth labeling [19]. 7. Data loop topics, addressing industry pain points and interview preparation [21].
WeRide Teams Up With Lenovo to Launch 100% Automotive-Grade HPC 3.0 Platform Powered by NVIDIA DRIVE AGX Thor Chips
Globenewswire· 2025-07-21 11:58
Core Viewpoint - WeRide has launched the HPC 3.0 high-performance computing platform, marking a significant advancement in autonomous driving technology and enabling the world's first mass-produced Level 4 autonomous vehicle, the Robotaxi GXR, powered by NVIDIA's DRIVE AGX Thor chips [1][4][10] Group 1: Product Development - The HPC 3.0 platform, developed in collaboration with Lenovo, features dual NVIDIA DRIVE AGX Thor chips and delivers up to 2,000 TOPS of AI compute, making it the most powerful computing platform for Level 4 autonomy [2][4] - The new platform reduces autonomous driving suite costs by 50% and cuts mass production costs to a quarter of its predecessor, HPC 2.0 [4][6] - HPC 3.0 consolidates key modules, which lowers the total cost of ownership (TCO) by 84% over its lifecycle compared to HPC 2.0 [4] Group 2: Safety and Compliance - HPC 3.0 is certified to AEC-Q100, ISO 26262, and IATF 16949 standards, with a failure rate below 50 FIT and a mean time between failures (MTBF) of 120,000 to 180,000 hours [5] - The platform is designed for 10 years or 300,000 km of use and can operate in extreme temperatures from -40°C to 85°C, meeting global VOCs environmental standards [5] Group 3: Strategic Partnerships - The collaboration with Lenovo and NVIDIA is highlighted as a major breakthrough in computing power and cost efficiency, enhancing vehicle reliability and responsiveness while significantly reducing deployment costs [6][7] - NVIDIA has been a strategic investor in WeRide since 2017, supporting the commercialization of autonomous driving solutions globally [8][9] Group 4: Market Position - WeRide is recognized as the world's first publicly listed Robotaxi company, having operated Robotaxis on public roads for over 2,000 days and tested its technology in over 30 cities across 10 countries [10][11] - The company has received autonomous driving permits in five markets: China, the UAE, Singapore, France, and the US, positioning itself as a leader in the autonomous driving industry [11]
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].
Waymo在美国得州奥斯汀扩大无人驾驶服务范围
news flash· 2025-07-18 10:18
Alphabet旗下无人驾驶技术公司Waymo当地时间7月17日宣布,当日起,其无人驾驶服务在美国得州奥 斯汀覆盖更多地区。 ...