Autonomous Driving
Search documents
传统的感知被嫌弃,VLA逐渐成为新秀......
自动驾驶之心· 2025-07-25 08:17
Core Insights - The article discusses the advancements in end-to-end autonomous driving algorithms, highlighting the emergence of various models and approaches in recent years, such as PLUTO, UniAD, OccWorld, and DiffusionDrive, which represent different technical directions in the field [1] - It emphasizes the shift in academic focus towards large models and Vision-Language-Action (VLA) methodologies, suggesting that traditional perception and planning tasks are becoming less prominent in top conferences [1] - The article encourages researchers to align their work with large models and VLA, indicating that there are still many subfields to explore despite the challenges for beginners [1] Summary by Sections Section 1: VLA Research Topics - The article introduces VLA research topics aimed at helping students systematically grasp key theoretical knowledge and expand their understanding of the specified direction [6] - It addresses the need for students to combine theoretical models with practical coding skills to develop new models and enhance their research capabilities [6] Section 2: Enrollment Information - The program has a limited enrollment capacity of 6 to 8 students per session [5] - It targets students at various academic levels (bachelor's, master's, and doctoral) who are interested in enhancing their research skills in autonomous driving and AI [7] Section 3: Course Outcomes - Participants will analyze classic and cutting-edge papers, understand key algorithms, and learn about writing and submission methods for academic papers [8][10] - The course includes a structured timeline of 12 weeks of online group research, followed by 2 weeks of paper guidance and a 10-week maintenance period [10] Section 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [13] - It emphasizes high academic standards and aims to equip students with a rich set of outputs, including a paper draft and a project completion certificate [13] Section 5: Technical Requirements - Students are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch [11] - Hardware requirements include access to high-performance machines, preferably with multiple GPUs [11] Section 6: Service and Support - The program includes dedicated supervisors to track student progress and provide assistance with academic and non-academic issues [17] - The course will be conducted via Tencent Meeting and recorded for later access [18]
基于3DGS和Diffusion的自动驾驶闭环仿真论文总结
自动驾驶之心· 2025-07-24 09:42
Core Viewpoint - The article discusses advancements in autonomous driving simulation technology, highlighting the integration of various components such as scene rendering, data collection, and intelligent agents to create realistic driving environments [1][2][3]. Group 1: Simulation Components - The first step involves creating a static environment using 3D Gaussian Splatting and Diffusion Models to build a realistic cityscape, capturing intricate details [1]. - The second step focuses on data collection from panoramic views to extract dynamic assets like vehicles and pedestrians, enhancing the realism of simulations [2]. - The third step emphasizes relighting techniques to ensure that assets appear natural under various lighting conditions, simulating different times of day and weather scenarios [2]. Group 2: Intelligent Agents and Weather Systems - The fourth step introduces intelligent agents that mimic real-world behaviors, allowing for complex interactions within the simulation [3]. - The fifth step incorporates weather systems to enhance the atmospheric realism of the simulation, enabling scenarios like rain or fog [4]. Group 3: Advanced Features - The sixth step includes advanced features that challenge autonomous vehicles with unexpected obstacles, simulating real-world driving complexities [4].
端到端自动驾驶万字长文总结
自动驾驶之心· 2025-07-23 09:56
Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].
聊聊自动驾驶闭环仿真和3DGS!
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm, which aims to efficiently model dynamic street scenes for autonomous driving simulations, addressing previous limitations in training and rendering speeds [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - The Street Gaussians algorithm represents dynamic urban street scenes as a combination of point-based backgrounds and foreground objects, utilizing optimized vehicle tracking poses [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to depict geometric shape and color, with parameters including covariance matrices and position vectors [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. Group 3: Innovations in Appearance Modeling - The article introduces a 4D spherical harmonic model to encode temporal information into the appearance of moving vehicles, reducing storage costs compared to traditional methods [12]. - The effectiveness of the 4D spherical harmonic model is demonstrated, showing significant improvements in rendering results and reducing artifacts [16]. Group 4: Initialization Techniques - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. Group 5: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][30].
8万条!清华开源VLA数据集:面向自动驾驶极端场景,安全提升35%
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article discusses the development of the Impromptu VLA dataset, which aims to address the data scarcity issue in unstructured driving environments for autonomous driving systems. It highlights the dataset's potential to enhance the performance of vision-language-action models in complex scenarios [4][29]. Dataset Overview - The Impromptu VLA dataset consists of over 80,000 meticulously constructed video clips, extracted from more than 2 million original materials across eight diverse open-source datasets [5][29]. - The dataset focuses on four key unstructured challenges: boundary-ambiguous roads, temporary traffic rule changes, unconventional dynamic obstacles, and complex road conditions [12][13]. Methodology - The dataset construction involved a multi-step process, including data collection, scene classification, and multi-task annotation generation, utilizing advanced visual-language models (VLMs) for scene understanding [10][17]. - A rigorous manual verification process was implemented to ensure high-quality annotations, with significant F1 scores achieved for various categories, confirming the reliability of the VLM-based annotation process [18]. Experimental Validation - The effectiveness of the Impromptu VLA dataset was validated through comprehensive experiments, showing significant performance improvements in mainstream autonomous driving benchmarks. For instance, the average score in the closed-loop NeuroNCAP test improved from 1.77 to 2.15, with a reduction in collision rates from 72.5% to 65.5% [6][21]. - In open-loop trajectory prediction evaluations, models trained with the Impromptu VLA dataset achieved L2 errors as low as 0.30 meters, demonstrating competitive performance compared to leading methods that rely on larger proprietary datasets [24]. Conclusion - The Impromptu VLA dataset serves as a critical resource for developing more robust and adaptive autonomous driving systems capable of handling complex real-world scenarios. The research confirms the dataset's significant value in enhancing perception, prediction, and planning capabilities in unstructured driving environments [29].
行车报漏检了,锅丢给了自动标注。。。
自动驾驶之心· 2025-07-22 07:28
Core Viewpoint - The article discusses the challenges and methodologies in automating the labeling of training data for occupancy networks (OCC) in autonomous driving, emphasizing the need for high-quality data to improve model generalization and safety [2][10]. Group 1: OCC and Its Importance - The occupancy network aims to partition space into small grids to predict occupancy, addressing irregular obstacles like fallen trees and other background elements [3][4]. - Since Tesla's announcement of OCC in 2022, it has become a standard in pure vision autonomous driving solutions, leading to a high demand for training data labeling [2][4]. Group 2: Challenges in Automated Labeling - The main challenges in 4D automated labeling include: 1. High temporal and spatial consistency requirements for tracking dynamic objects across frames [9]. 2. Complexity in fusing multi-modal data from various sensors [9]. 3. Difficulty in generalizing to dynamic scenes due to unpredictable behaviors of traffic participants [9]. 4. The contradiction between labeling efficiency and cost, as high precision requires manual verification [9]. 5. High requirements for generalization in production scenarios, necessitating data extraction from diverse environments [9]. Group 3: Training Data Generation Process - The common process for generating OCC training ground truth involves: 1. Ensuring consistency between 2D and 3D object detection [8]. 2. Comparing with edge models [8]. 3. Involving manual labeling for quality control [8]. Group 4: Course Offerings - The article promotes a course on 4D automated labeling, covering the entire process and core algorithms, aimed at learners interested in the autonomous driving data loop [10][26]. - The course includes practical exercises and addresses real-world challenges in the field, enhancing algorithmic capabilities [10][26]. Group 5: Course Structure - The course is structured into several chapters, including: 1. Basics of 4D automated labeling [11]. 2. Dynamic obstacle labeling [13]. 3. Laser and visual SLAM reconstruction [14]. 4. Static element labeling based on reconstruction [16]. 5. General obstacle OCC labeling [18]. 6. End-to-end ground truth labeling [19]. 7. Data loop topics, addressing industry pain points and interview preparation [21].
WeRide Teams Up With Lenovo to Launch 100% Automotive-Grade HPC 3.0 Platform Powered by NVIDIA DRIVE AGX Thor Chips
Globenewswire· 2025-07-21 11:58
Core Viewpoint - WeRide has launched the HPC 3.0 high-performance computing platform, marking a significant advancement in autonomous driving technology and enabling the world's first mass-produced Level 4 autonomous vehicle, the Robotaxi GXR, powered by NVIDIA's DRIVE AGX Thor chips [1][4][10] Group 1: Product Development - The HPC 3.0 platform, developed in collaboration with Lenovo, features dual NVIDIA DRIVE AGX Thor chips and delivers up to 2,000 TOPS of AI compute, making it the most powerful computing platform for Level 4 autonomy [2][4] - The new platform reduces autonomous driving suite costs by 50% and cuts mass production costs to a quarter of its predecessor, HPC 2.0 [4][6] - HPC 3.0 consolidates key modules, which lowers the total cost of ownership (TCO) by 84% over its lifecycle compared to HPC 2.0 [4] Group 2: Safety and Compliance - HPC 3.0 is certified to AEC-Q100, ISO 26262, and IATF 16949 standards, with a failure rate below 50 FIT and a mean time between failures (MTBF) of 120,000 to 180,000 hours [5] - The platform is designed for 10 years or 300,000 km of use and can operate in extreme temperatures from -40°C to 85°C, meeting global VOCs environmental standards [5] Group 3: Strategic Partnerships - The collaboration with Lenovo and NVIDIA is highlighted as a major breakthrough in computing power and cost efficiency, enhancing vehicle reliability and responsiveness while significantly reducing deployment costs [6][7] - NVIDIA has been a strategic investor in WeRide since 2017, supporting the commercialization of autonomous driving solutions globally [8][9] Group 4: Market Position - WeRide is recognized as the world's first publicly listed Robotaxi company, having operated Robotaxis on public roads for over 2,000 days and tested its technology in over 30 cities across 10 countries [10][11] - The company has received autonomous driving permits in five markets: China, the UAE, Singapore, France, and the US, positioning itself as a leader in the autonomous driving industry [11]
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].
Waymo在美国得州奥斯汀扩大无人驾驶服务范围
news flash· 2025-07-18 10:18
Core Insights - Waymo, a subsidiary of Alphabet, announced on July 17 that it has expanded its autonomous driving service to cover more areas in Austin, Texas [1] Company Summary - Waymo is enhancing its autonomous driving service coverage in Austin, Texas, indicating a strategic move to increase its operational footprint in the U.S. market [1]
WeRide Launches Southeast Asia’s First Fully Driverless Robobus Operations at Resorts World Sentosa, Singapore
GlobeNewswire· 2025-07-17 09:52
Core Insights - WeRide has launched fully driverless Robobus operations at Resorts World Sentosa, Singapore, marking the first autonomous vehicle in Southeast Asia to operate without a safety officer on board [1][2][4] Company Developments - WeRide received approval from the Land Transport Authority of Singapore (LTA) after extensive testing and safety assessments, allowing the Robobus to offer fully autonomous rides to the public [2][9] - The Robobus has been operational since June 2024, transporting tens of thousands of passengers and maintaining a zero-incident safety record [2][5] - The Robobus operates on a fixed 12-minute loop connecting key points within Resorts World Sentosa, utilizing advanced LIDAR, cameras, and sensors for obstacle detection [5] Industry Impact - The launch is seen as a significant milestone for the future of mobility in Southeast Asia, with WeRide's vehicles expected to transform public transportation [4][6] - Singapore's government plans to integrate autonomous vehicles into the national public transport network by the end of 2025, aligning with WeRide's operations [6][9] - WeRide has established a dedicated R&D center in Singapore to advance autonomous vehicle innovation, supported by the Singapore Economic Development Board [6][12] Future Collaborations - WeRide aims to strengthen collaborations with LTA and various stakeholders to introduce more validated products and scalable business models across Singapore and Southeast Asia [11][12]