自动驾驶之心
Search documents
语言或许不是自驾的「终极答案」,但它无疑是当下最可行的路径...
自动驾驶之心· 2025-11-29 02:06
Core Insights - The article emphasizes that the current development of autonomous driving systems relies heavily on data-driven models, but the next breakthrough must focus on enhancing reasoning capabilities [2][4][7]. Group 1: Current State of Autonomous Driving - The industry predominantly uses a classic data flywheel model for production models, which includes deployment, effect verification, data mining, retraining, and redeployment [4]. - As data scales have increased to tens of millions, the performance gains from merely increasing data size have diminished, leading to higher costs and more complex challenges [4][7]. - Companies like Tesla, Li Auto, Xiaomi, and Xpeng have recognized this shift and are adapting their strategies accordingly [4]. Group 2: Insights from Other Fields - The article draws parallels between autonomous driving and robotics, noting that while autonomous driving has benefited from abundant data, robotics has faced data scarcity [7]. - Recent advancements in robotics, such as large datasets and new algorithms, have emerged from this scarcity, potentially paving the way for more robust capabilities in embodied intelligence [7]. Group 3: Future Directions in Autonomous Driving - The article identifies a critical flaw in current autonomous driving systems: the lack of deep reasoning capabilities [7]. - To transition to the next phase of autonomous driving (referred to as Autonomous Driving 3.0), four pillars are necessary: reasoning ability, common-sense cognition, long-term memory, and explanation and interaction [7][9]. - NVIDIA's upcoming Alpamayo-R1 model aims to integrate explicit causal reasoning with trajectory planning within a unified VLA architecture, highlighting a shift towards reasoning-driven approaches [7]. Group 4: Community and Learning Resources - The article promotes a community platform for knowledge sharing in autonomous driving, which includes resources for beginners and advanced learners, as well as opportunities for networking with industry experts [13][28]. - The community has compiled extensive resources, including over 40 technical directions and nearly 60 datasets related to autonomous driving, facilitating easier access to information for both newcomers and experienced professionals [28][50].
地平线RAD:基于3DGS 大规模强化学习的端到端驾驶策略
自动驾驶之心· 2025-11-29 02:06
Core Insights - The article discusses a novel approach to reinforcement learning (RL) for end-to-end (e2e) policy development in autonomous driving, utilizing 3D Graphics Simulation (3DGS) to enhance training environments [1][2] - The proposed method significantly reduces collision rates, achieving a threefold decrease compared to pure imitation learning (IL) [1] - Limitations of the 3DGS environment include a lack of interaction, reliance on log replay, and inadequate rendering of non-rigid pedestrians and low-light scenarios [1] Summary by Sections Methodology - The approach consists of three main phases: training a basic Bird's Eye View (BEV) and perception model, freezing perception to train a planning head using IL, and generating a sensor-level environment with 3DGS for mixed training of RL and IL [3][5][6] - The training process involves pre-training perception models, followed by IL training on human expert data, and finally fine-tuning with RL to enhance sensitivity to critical risk scenarios [10][12] State and Action Space - The state space includes various encoders for BEV features, static map elements, traffic participant information, and planning-related features [7] - The action space is defined with discrete movements for lateral and longitudinal actions, allowing for a total of 61 actions in both dimensions [8] Reward Function - The reward function is designed to penalize collisions and deviations from expert trajectories, with specific thresholds for dynamic and static collisions, as well as positional and heading deviations [17][19] - Auxiliary tasks are introduced to stabilize training and accelerate convergence, focusing on behaviors like deceleration and acceleration [20][23] Experimental Results - The results indicate that the proposed method outperforms other IL-based algorithms, demonstrating the advantages of closed-loop training in dynamic environments [28][29] - The optimal ratio of RL to IL data is found to be 4:1, contributing to improved performance metrics [28] Conclusion - The article emphasizes the practical engineering improvements achieved through the integration of 3DGS in training environments, leading to better performance in autonomous driving applications [1][2]
理想披露了一些新的技术信息
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article discusses the advancements and challenges faced by Li Auto in the development of its autonomous driving technology, particularly focusing on the end-to-end model and VLA (Vision-Language-Action) integration [2][5][9]. Group 1: Model Performance and Data Utilization - The performance improvement of end-to-end models slows down after reaching a certain amount of training data, specifically after 10 million clips, where the model's MPI (Miles Per Interaction) only doubled in five months [5]. - To enhance model performance, Li Auto adjusted the training data mix, increasing the quantity of generated data, including corner cases, and implementing manual rules for safety and compliance in special scenarios [5][9]. Group 2: VLA Integration and Decision-Making - The introduction of VLA aims to enhance the decision-making capabilities of the end-to-end model, addressing issues such as illogical behavior, lack of deep thinking in decision-making, and insufficient preventive judgment based on scenarios [5][6]. - VLA incorporates spatial intelligence, linguistic intelligence, and action policy, allowing the model to understand and communicate spatial information effectively, and generate smooth driving trajectories using diffusion models [6][9]. Group 3: Simulation and Testing Efficiency - Li Auto upgraded its model evaluation methods by utilizing a world model for closed-loop simulation and testing, significantly reducing testing costs from 18.4 per kilometer to 0.53 per kilometer [9][11]. - The closed-loop training framework AD-R1 was introduced, allowing for efficient data management and reinforcement learning, with high-value data being processed through a series of steps back to the cloud platform [11][12]. Group 4: Computational Power and Resources - Li Auto's total computational power is 13 EFLOPS, with 3 EFLOPS dedicated to inference and 10 EFLOPS for training, utilizing 50,000 training and inference cards [13]. - The emphasis on inference power is crucial in the VLA era, as it is necessary for generating simulation training environments [13].
关于端到端和VLA岗位,近期的一些态势变化
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article discusses the challenges in recruiting talent in the autonomous driving sector, highlighting a shortage of experienced professionals in advanced roles [2] - It emphasizes the importance of education and training in cutting-edge technologies related to end-to-end and VLA (Vision-Language-Action) autonomous driving [2] Course Offerings - A course titled "End-to-End and VLA Autonomous Driving" is being offered, focusing on the latest technologies in the field, including BEV perception, VLM, diffusion models, and reinforcement learning [2][12] - The course is designed for individuals with a foundational knowledge of autonomous driving and related technologies, and it includes practical assignments to build VLA models and datasets [12][16] Instructor Profiles - The course features a team of instructors with strong academic backgrounds and practical experience in autonomous driving and large models, including researchers from top universities [8][11][14] - Instructors have published numerous papers in prestigious conferences and have experience in developing and implementing advanced algorithms in the industry [8][11][14] Target Audience - The course is aimed at individuals who have a basic understanding of autonomous driving modules and are familiar with concepts such as transformer models, reinforcement learning, and BEV perception [16] - Participants are required to have access to a GPU with recommended specifications of 4090 or higher [15][16]
下周六具身一场深度直播:VLA与RL的落地之问!
自动驾驶之心· 2025-11-28 00:49
Group 1 - The article discusses the transition of live streaming and content acquisition to the "Embodied Intelligence Heart Knowledge Planet" platform [2] - It highlights the high-quality roundtable discussions previously held on topics such as ontology, data, and simulation [2] - The focus of the current discussion is on the VLA algorithm and its implementation with reinforcement learning (RL) [3] Group 2 - Key topics include the pain points of the VLA architecture and model [6] - Exploration of advancements in full-body motion control solutions for robots to improve their performance [6] - Discussion on how to effectively implement VLA with RL on real machines, including board selection and lightweight design [6] Group 3 - The article features notable guests such as the Vice President of Algorithm at Digua Robotics, the Chief Researcher at Beijing Humanoid Robotics, and a partner at Yuanli Lingji [9][11][13] - It also mentions a Tsinghua University PhD who will soon join the Tsinghua Institute of Advanced Studies as an assistant professor [15] Group 4 - The article promotes the availability of in-depth content on the "Embodied Intelligence Heart" knowledge platform, including technical details, Q&A, and exclusive insights [18] - It emphasizes the importance of dexterous hand design as a key technology for closing the "hand-eye-brain" perception loop [18] - The article introduces the concept of "Agent" and its significance in thought, academia, and engineering [18] - It mentions the Spec-VLA framework designed for inference acceleration specific to VLA [18] - The latest developments from CMU on cross-entity world models aiding in small-sample robot learning are also highlighted [18]
自动驾驶之心企业服务与咨询正式推出!
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article highlights the launch of enterprise services by the company "Automated Driving Heart," which aims to support businesses in the autonomous driving sector through various consulting and training services [1][2]. Group 1: Company Services - The company has developed nearly 50 courses focused on self-driving and embodied technology, catering primarily to the consumer market in its initial two years [1]. - The newly introduced enterprise services include brand promotion, industry consulting, technical training, and team upgrades [4]. - The company has accumulated nearly three years of industry consulting and training experience, along with a substantial expert talent pool and a fan base of nearly 400,000 across platforms [1]. Group 2: Partnerships and Collaborations - The company has established collaborations with multiple domestic universities, vocational colleges, Tier 1 suppliers, original equipment manufacturers (OEMs), and embodied robotics companies [2]. - The goal is to reach more companies in need of upgrades and to promote advancements in the autonomous driving field [2].
直观理解Flow Matching生成式算法
自动驾驶之心· 2025-11-28 00:49
Algorithm Overview - Flow Matching is a generative model that aims to generate samples similar to a given target set without any input [3][4] - The model learns a direction of movement from a source point to a target point, effectively generating new samples by iteratively adjusting the position towards the target [14][17] Training and Inference - During training, the model samples points along the line connecting source and target, learning the average slope from multiple connections [16][17] - In inference, the model starts from a noise point and moves towards the target, gradually collapsing to a specific state as it approaches the target [17][18] Code Implementation - The implementation involves generating random inputs, predicting the slope using a neural network, and applying an optimization process to minimize the loss between predicted and target slopes [18][19] - The code includes hyperparameters for dimensions, sample sizes, and training epochs, demonstrating a straightforward approach to implementing the Flow Matching algorithm [19][25] Advanced Applications - The model can be adapted to generate samples based on prompts, allowing for more controlled generation by segmenting the target distribution [24][29] - A more complex example includes generating handwritten digits from the MNIST dataset, showcasing the model's versatility in handling different types of data [30][32] Model Architecture - The architecture includes a UNet backbone for predicting the velocity field, which enhances performance through multi-scale feature fusion [32][34] - The model incorporates conditional inputs to refine the generation process, ensuring that the output aligns with specified conditions [34][35] Training Process - The training loop involves generating dynamic noise, calculating the loss based on the difference between predicted and actual images, and updating the model parameters accordingly [40][41] - The model is designed to visualize generated samples periodically, providing insights into its performance and output quality [40][41]
毫末智行突然原地解散!宇宙第一正式下线
自动驾驶之心· 2025-11-27 00:04
Core Insights - The article discusses the recent dissolution of a self-driving technology company, Haomo Technology, which has faced significant operational challenges and staff turnover [2][3]. Company Overview - Haomo Technology was established on November 29, 2019, as a subsidiary of Great Wall Motors, focusing on autonomous driving systems. The core team includes talents from Great Wall Motors and tech companies like Baidu and Huawei [6]. - The company previously made rapid advancements, launching its first last-mile delivery vehicle, "Xiao Mo Tuo," in November 2020, and introduced the MANA data intelligence system in December 2021, which has accumulated over 620,000 hours of learning time by 2023 [6][8]. Recent Developments - In 2023, Haomo Technology has seen a mass exodus of key personnel, including the departure of the chairman and several vice presidents, leading to concerns about its operational stability [5]. - The company's official communications have ceased since June 2023, with the last update being a holiday poster on October 1 [6]. Market Impact - Following the announcement of the company's closure, users of Haomo's products expressed concerns and dissatisfaction regarding their experience with the products [11].
面向工业界的3DGS全栈学习路线图(前馈GS等)
自动驾驶之心· 2025-11-27 00:04
Core Insights - The rapid technological iteration in 3DGS (3D Graphics Systems) is highlighted, with advancements from static reconstruction to dynamic and surface reconstruction, culminating in feed-forward 3DGS [1] - A comprehensive learning roadmap for 3DGS has been developed to assist newcomers in mastering both theoretical and practical aspects of the technology [1] Course Overview - The course is structured into six chapters, starting with foundational knowledge in computer graphics and progressing through principles, algorithms, and applications in autonomous driving [5][6][7][8][9] - The course aims to provide a detailed understanding of 3DGS, including tools like SuperSplat and frameworks such as Gsplat and DriveStudio [5][6][7] Target Audience - The course is designed for individuals with a background in computer graphics, visual reconstruction, and programming, specifically those familiar with Python and PyTorch [14] Learning Outcomes - Participants will gain a solid grasp of 3DGS theory, algorithm development, and industry applications, enabling them to engage in discussions about job demands and industry challenges [10][12]
闭环训练终于补上了!AD-R1:世界模型端到端闭环强化学习新框架(澳门大学&理想等)
自动驾驶之心· 2025-11-27 00:04
Core Insights - The article discusses the advancements in autonomous driving through the introduction of the AD-R1 framework, which utilizes an Impartial World Model to address the "optimistic bias" found in traditional world models [2][3][57] - The framework allows for closed-loop reinforcement learning, enabling autonomous vehicles to learn from imagined failures, thereby improving safety and decision-making capabilities [9][57] Group 1: Background and Challenges - End-to-end autonomous driving has transformed the industry, but challenges remain, particularly with long-tail event failures due to distribution shifts [6] - Traditional reinforcement learning methods rely on external simulators, which have limitations such as simulation-to-reality gaps and lack of interactivity [6][9] - The need for a paradigm shift towards learning 3D/4D world models as high-fidelity generative simulators is emphasized [6] Group 2: Optimizing World Models - The AD-R1 framework introduces a new approach to mitigate the optimistic bias in world models, which often fail to predict negative outcomes [2][7] - The Impartial World Model (IWM) is designed to accurately reflect the consequences of both safe and unsafe behaviors, enhancing the reliability of predictions [3][10] - A counterfactual synthesis pipeline is implemented to generate a diverse training dataset that includes reasonable collision and lane deviation scenarios [3][10] Group 3: Experimental Results - The IWM significantly outperforms traditional models in risk prediction tasks, demonstrating its ability to accurately foresee failures [47][48] - The application of the AD-R1 framework leads to notable improvements in safety and performance metrics across various baseline models, with absolute increases in planning decision metrics (PDMS) of 1.7% and 1.1% [49] - Ablation studies reveal that the introduction of counterfactual synthesis and model-level optimizations are critical for enhancing causal fidelity and overall performance [51][52] Group 4: Future Directions - Future research may focus on generating counterfactual failure samples from unlabeled data to reduce reliance on high-precision annotations [57] - Expanding the framework to more complex multi-agent interaction scenarios could further enhance the robustness of autonomous driving systems in long-tail events [57]