Workflow
端到端自动驾驶
icon
Search documents
自动驾驶三大技术路线:端到端、VLA、世界模型
自动驾驶之心· 2025-11-21 00:04
Overview - The article discusses the ongoing technological competition in the autonomous driving industry, focusing on different approaches to solving corner cases and enhancing safety and efficiency in driving systems [1][3]. Technological Approaches - There is a debate between two main technological routes: single-vehicle intelligence (VLA) and intelligent networking (VLM) [1]. - Major companies like Waymo utilize VLM, which allows AI to handle environmental understanding and reasoning, while traditional modules maintain decision-making control for safety [1]. - Companies such as Tesla, Geely, and XPeng are exploring VLA, aiming for AI to learn all driving skills through extensive data training for end-to-end decision-making [1]. Sensor and Algorithm Developments - The article highlights the evolution of perception technologies, with BEV (Bird's Eye View) perception becoming mainstream by 2022, and OCC (Occupancy) perception gaining traction in 2023 [3][5]. - BEV integrates various sensor data into a unified spatial representation, facilitating better path planning and dynamic information fusion [8][14]. - OCC perception provides detailed occupancy data, clarifying the probability of space being occupied over time, which enhances dynamic interaction modeling [6][14]. Modular and End-to-End Systems - Prior to the advent of multimodal large models and end-to-end autonomous driving technologies, perception and prediction tasks were typically handled by separate modules [5]. - The article outlines a phased approach to modularization, where perception, prediction, decision-making, and control are distinct yet interconnected [4][31]. - End-to-end systems aim to streamline the process by allowing direct mapping from raw sensor inputs to actionable outputs, enhancing efficiency and reducing bottlenecks [20][25]. VLA and VLM Frameworks - VLA (Visual-Language-Action) and VLM (Visual-Language Model) frameworks are discussed, with VLA focusing on understanding complex scenes and making autonomous decisions based on visual and language inputs [32][39]. - The article emphasizes the importance of language models in enhancing the interpretability and safety of autonomous driving systems, allowing for better cross-scenario knowledge transfer and decision-making [57]. Future Directions - The competition between VLA and WA (World Action) architectures is highlighted, with WA emphasizing direct visual-to-action mapping without language mediation [55][56]. - The article suggests that the future of autonomous driving will involve integrating world models that understand physical laws and temporal dynamics, addressing the limitations of current language models [34][54].
和港校自驾博士交流后的一些分享......
自动驾驶之心· 2025-11-20 00:05
Core Viewpoint - The article emphasizes the importance of building a comprehensive community for autonomous driving, providing resources, networking opportunities, and guidance for both newcomers and experienced professionals in the field [6][16][19]. Group 1: Community and Networking - The "Autonomous Driving Heart Knowledge Planet" community aims to create a platform for technical exchange and collaboration among members from renowned universities and leading companies in the autonomous driving sector [16][19]. - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, facilitating discussions on technology trends and industry developments [6][7]. - Members can freely ask questions regarding career choices and research directions, receiving insights from industry experts [89][92]. Group 2: Learning Resources - The community offers a variety of learning materials, including video tutorials, technical routes, and Q&A sessions, covering over 40 technical directions in autonomous driving [9][11][16]. - Specific learning paths are provided for newcomers, including foundational courses and advanced topics in areas such as end-to-end driving, multi-sensor fusion, and 3D target detection [11][17][36]. - The community has compiled a comprehensive list of open-source projects and datasets relevant to autonomous driving, aiding members in their research and development efforts [32][34][36]. Group 3: Career Development - The community facilitates job referrals and connections with various autonomous driving companies, enhancing members' employment opportunities [11][19]. - Regular discussions with industry leaders are organized to explore career paths, job openings, and the latest trends in the autonomous driving field [8][19][92]. - Members are encouraged to engage in research collaborations and internships, particularly for those pursuing advanced degrees in related fields [3][6][16].
模仿学习之外,端到端轨迹如何优化?轻舟一篇刷榜的工作......
自动驾驶之心· 2025-11-10 03:36
Core Insights - The article discusses the development of CATG, a new trajectory generation framework based on flow matching, which addresses limitations in existing end-to-end autonomous driving systems [1][4][22] - CATG achieved a score of 51.31 in the NAVSIM V2 challenge, demonstrating its effectiveness in trajectory planning and robustness against out-of-distribution data [4][22] Background Review - End-to-end multimodal planning has become a key method in autonomous driving, significantly improving robustness and adaptability compared to single trajectory prediction methods [3] - Current multimodal methods often rely on imitation learning, leading to a lack of behavioral diversity due to insufficient strategy diversity in real trajectories [3][6] - Various alternative strategies have been proposed to capture a broader distribution of reasonable trajectories, but many still struggle with integrating safety constraints directly into the generation process [3][6] Proposed Framework - CATG completely abandons imitation learning and supports the flexible injection of explicit constraints during the generation process [4][22] - The framework integrates feasibility and safety constraints into the generation process through a progressive mechanism, utilizing prior perception anchor points [7][22] - CATG allows for controllable trade-offs between aggressive and conservative driving styles by using environmental reward signals as conditional inputs [7][13] Experimental Results - CATG was extensively evaluated in the NAVSIM V2 challenge, showcasing superior planning accuracy and robust generalization capabilities [4][14] - The model's training involved two phases: the first focused on training the flow matching process, and the second on fine-tuning the energy matching process [18][22] - The results indicated high compliance with various metrics, including 100% drivable area compliance and 98.21% no-at-fault collisions in stage one [19] Limitations - The computational cost of generating trajectories through 100-step sampling remains high, and accelerating the sampling process may compromise trajectory quality [21] Conclusion - The article concludes that CATG represents a significant advancement in end-to-end planning for autonomous driving, effectively incorporating flexible conditional signals and explicit constraints during trajectory generation [22]
“中文AI三大顶会”已有两家报导了理想近期AI进展
理想TOP2· 2025-11-09 14:59
Core Insights - The article discusses the rising prominence of Li Auto in the autonomous driving sector, particularly its recent advancements presented at the ICCV 2025 conference, where it introduced a new paradigm for autonomous driving that integrates world models with reinforcement learning [1][2][4]. Group 1: Company Developments - Li Auto's research and development in autonomous driving began in 2021, evolving from initial BEV solutions to more advanced systems [5]. - The company has significantly invested in AI, with nearly half of its R&D budget allocated to this area, indicating a strong commitment to integrating AI into its vehicle technology [2]. - Li Auto's recent presentation at ICCV 2025 highlighted its innovative approach, which combines synthetic data to address rare scenarios, leading to a notable improvement in human takeover mileage (MPI) [2][4]. Group 2: Industry Reception - The reception of Li Auto's advancements has been overwhelmingly positive, with many industry observers praising its research and development efforts, positioning it as a model for Chinese automotive companies [2][4]. - Articles from major Chinese AI platforms like Quantum Bit and Machine Heart have garnered significant attention, with one article achieving over 39,000 reads, reflecting the growing interest in Li Auto's developments [1][2]. Group 3: Competitive Landscape - Li Auto is recognized as a leading player in the Chinese autonomous driving space, with a notable presence in discussions surrounding AI and autonomous vehicle technology [22]. - The company aims to differentiate itself not just as an automotive manufacturer but as a competitive AI entity, aligning its goals with broader AI advancements and the five stages of AI development as defined by OpenAI [18][19].
地平线ResAD:残差学习让自动驾驶决策更接近人类逻辑
自动驾驶之心· 2025-11-07 16:04
Core Insights - The article discusses the limitations of traditional modular approaches in autonomous driving and introduces the ResAD framework, which aims to improve efficiency and safety by using an end-to-end model that focuses on learning necessary adjustments from a baseline trajectory [2][50]. Group 1: Framework Overview - ResAD framework proposes a shift from directly predicting future trajectories to learning the necessary adjustments from a physical baseline trajectory, termed "inertial reference line" [2][50]. - The model focuses on understanding the reasons for trajectory adjustments, such as obstacles and traffic rules, rather than memorizing data correlations [50]. Group 2: Methodology - The ResAD framework incorporates a "normalized residual trajectory modeling" approach, which simplifies the learning problem by defining trajectory predictions as adjustments to a reference line [11][50]. - The framework employs a "point-wise residual normalization" technique to balance the optimization weights of near and far trajectory points, ensuring that critical adjustments are not overlooked [20][50]. Group 3: Testing and Results - Real-world testing demonstrated the effectiveness of the ResAD framework, showcasing its ability to handle complex driving scenarios and respond intelligently to dynamic obstacles [6]. - In benchmark evaluations, ResAD achieved state-of-the-art performance on NAVSIM v1 and v2, with a PDMS score of 88.6 and an EPDMS score of 85.5, indicating high safety and efficiency in route completion [38][39]. Group 4: Comparative Analysis - ResAD outperformed existing models like DiffusionDrive in various metrics, including lane adherence and route completion efficiency, highlighting its superior trajectory generation capabilities [41][39]. - The article emphasizes the importance of the unique trajectory modeling strategy in ResAD, which allows for the generation of contextually relevant and diverse trajectories without relying on a static trajectory library [10][41].
传统规划控制不太好找工作了。。。
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The article emphasizes the evolving landscape of autonomous driving, highlighting the shift from traditional planning and control methods to end-to-end approaches, which are increasingly favored in the industry [2][29]. Summary by Sections Course Offerings - The company has designed a specialized course on end-to-end planning and control in autonomous driving, aimed at addressing real-world challenges and enhancing employability [6][12]. - The course will cover essential algorithms and frameworks used in the industry, focusing on practical applications and integration of traditional and modern methods [6][21]. Course Structure - The course consists of six chapters, each focusing on different aspects of planning and control, including foundational algorithms, decision-making frameworks, and handling uncertainty in environments [20][24][29]. - The course will also include interview preparation, resume enhancement, and mock interviews to support participants in securing job offers [31][10]. Target Audience - The course is designed for individuals with a background in vehicle engineering, automation, computer science, and related fields, particularly those seeking to transition into autonomous driving roles [37][39]. - Participants are expected to have a basic understanding of programming and relevant mathematical concepts to fully benefit from the course [43]. Instructor Expertise - The course will be led by an experienced instructor with a strong background in autonomous driving algorithms and practical implementation, ensuring that participants receive high-quality guidance [34][10]. Additional Benefits - Participants will have access to supplementary resources, including code and development environments, to enhance their learning experience [13][15]. - The course aims to provide a comprehensive understanding of the industry, equipping participants with the skills needed to tackle complex problems in autonomous driving [6][13].
地平线HSD的确值得留意
自动驾驶之心· 2025-10-29 03:30
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the performance of Horizon's HSD system compared to Li Auto's VLA system, highlighting the strengths and weaknesses of both [5][6]. Group 1: Technology Comparison - Horizon's HSD technology architecture utilizes visual information for trajectory output, with laser radar positioning as a safety redundancy, while the VLA system is criticized for its high computational and bandwidth requirements [5]. - During a test drive of the Horizon HSD engineering vehicle, the experience was reported to be significantly better than the current production version of Li Auto's VLA, particularly in terms of comfort and smoothness during traffic conditions [6]. - Feedback from the Horizon team indicated that the HSD system performs well in controlled environments but has limitations in extreme weather and complex scenarios, suggesting a need for further development [7]. Group 2: Community and Collaboration - The article mentions the establishment of nearly a hundred technical discussion groups related to various aspects of autonomous driving, with a community of around 4,000 members and over 300 companies and research institutions involved [8]. - The collaboration between Horizon and vehicle manufacturers is emphasized, with a focus on integrating user interface elements that respect manufacturer preferences, which can impact the overall driving experience [7]. Group 3: Future Outlook - The article suggests that while the HSD system shows promise, it is still in development and may not yet reach full autonomous driving capabilities, estimating it to be around 60% of the level of Li Auto's V13 system [7].
ICCV 2025「端到端自动驾驶」冠军方案分享!
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article highlights the victory of Inspur's AI team in the Autonomous Grand Challenge 2025, where they achieved a score of 53.06 in the end-to-end autonomous driving track using their innovative framework "SimpleVSF" [2][7][13] - The framework integrates bird's-eye view perception trajectory prediction with a vision-language multimodal model, enhancing decision-making capabilities in complex traffic scenarios [2][5][8] Summary by Sections Competition Overview - The ICCV 2025 Autonomous Driving Challenge is a significant international event focusing on autonomous driving and embodied intelligence, featuring three main tracks [4] - The end-to-end driving challenge evaluates trajectory prediction and behavior planning using a data-driven simulation framework, emphasizing safety and efficiency across nine key metrics [4] Technical Challenges - End-to-end autonomous driving aims to reduce errors and information loss from traditional modular approaches, yet struggles with decision-making in complex real-world scenarios [5] - Current methods can identify basic elements but fail to understand higher-level semantics and situational awareness, leading to suboptimal decisions [5] Innovations in SimpleVSF Framework - The SimpleVSF framework bridges the gap between traditional trajectory planning and semantic understanding through a vision-language model (VLM) [7][8] - The VLM-enhanced scoring mechanism improves decision quality and scene adaptability, resulting in a 2% performance increase for single models and up to 6% in fusion decision-making [8][11] Decision-Making Mechanism - The dual fusion decision mechanism combines quantitative and qualitative assessments, ensuring optimal trajectory selection based on both numerical and semantic criteria [10][11] - The framework employs advanced models for generating diverse candidate trajectories and extracting robust environmental features, enhancing overall system performance [13] Achievements and Future Directions - The SimpleVSF framework's success in the challenge sets a new benchmark for end-to-end autonomous driving technology, supporting further advancements in the field [13] - Inspur's AI team aims to leverage their algorithmic and computational strengths to drive innovation in autonomous driving technology [13]
给自动驾驶业内新人的一些建议
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article emphasizes the establishment of a comprehensive community called "Autonomous Driving Heart Knowledge Planet," aimed at bridging the gap between academia and industry in the field of autonomous driving [1][3][14]. Group 1: Community Development - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, providing a platform for technical sharing and communication among beginners and advanced learners [3][14]. - The community offers various resources, including videos, articles, learning paths, Q&A sessions, and job exchange opportunities, making it a holistic hub for autonomous driving enthusiasts [1][3][5]. Group 2: Learning Resources - The community has compiled over 40 technical learning paths, covering topics such as end-to-end learning, multi-modal large models, and data annotation practices, significantly reducing the time needed for research [5][14]. - Members can access a variety of video tutorials and courses tailored for beginners, covering essential topics in autonomous driving technology [9][15]. Group 3: Industry Engagement - The community collaborates with numerous industry leaders and academic experts to discuss trends, technological advancements, and production challenges in autonomous driving [6][10][14]. - There is a mechanism for job referrals within the community, facilitating connections between members and leading companies in the autonomous driving sector [10][12]. Group 4: Technical Focus Areas - The community has organized resources on various technical areas, including 3D object detection, multi-sensor fusion, and high-precision mapping, which are crucial for the development of autonomous driving technologies [27][29][31]. - Specific focus is given to emerging technologies such as visual language models (VLM) and world models, with detailed summaries and resources available for members [37][39][45].
特斯拉世界模拟器亮相ICCV,VP亲自解密端到端自动驾驶技术路线
3 6 Ke· 2025-10-27 08:11
Core Insights - Tesla has unveiled a world simulator for generating realistic driving scenarios, which was presented by Ashok Elluswamy at the ICCV conference, emphasizing the future of intelligent driving lies in end-to-end AI [1][5][24] Group 1: World Simulator Features - The world simulator can create new challenging scenarios for autonomous driving tasks, such as vehicles suddenly changing lanes or AI navigating around pedestrians and obstacles [2] - The generated scenario videos serve dual purposes: training autonomous driving models and providing a gaming experience for human users [2][4] Group 2: End-to-End AI Approach - Elluswamy highlighted that end-to-end AI is the future of autonomous driving, utilizing data from various sensors to generate control commands for vehicles [5][8] - The end-to-end approach is contrasted with modular systems, which are easier to develop initially but lack the optimization and scalability of end-to-end systems [8][10] Group 3: Challenges and Solutions - One major challenge for end-to-end autonomous driving is evaluation, which the world simulator addresses by using a vast dataset to synthesize future states based on current conditions [11] - The complexity of real-world data, such as high frame rates and multiple sensor inputs, leads to a "curse of dimensionality," which Tesla mitigates by collecting extensive driving data to enhance model generalization [13][15] Group 4: Industry Perspectives - The industry is divided between two main approaches to end-to-end autonomous driving: VLA (Vision-Language-Action) and world models, with various companies adopting different strategies [24] - Tesla's choice of the end-to-end approach has garnered attention due to its historical success in the autonomous driving space, raising questions about the future direction of the technology [24]