Workflow
端到端自动驾驶
icon
Search documents
闭环端到端暴涨20%!华科&小米打造开源框架ORION
自动驾驶之心· 2025-08-30 16:03
Core Viewpoint - The article discusses the advancements in end-to-end (E2E) autonomous driving technology, particularly focusing on the introduction of the ORION framework, which integrates vision-language models (VLM) for improved decision-making in complex environments [3][30]. Summary by Sections Introduction - Recent progress in E2E autonomous driving technology faces challenges in complex closed-loop interactions due to limited causal reasoning capabilities [3][12]. - VLMs offer new hope for E2E autonomous driving but there remains a significant gap between VLM's semantic reasoning space and the numerical action space required for driving [3][17]. ORION Framework - ORION is proposed as an end-to-end autonomous driving framework that utilizes visual-language instructions for trajectory generation [3][18]. - The framework incorporates QT-Former for aggregating long-term historical context, VLM for scene understanding and reasoning, and a generative model to align reasoning and action spaces [3][16][18]. Performance Evaluation - ORION achieved a driving score of 77.74 and a success rate of 54.62% on the challenging Bench2Drive dataset, outperforming previous state-of-the-art (SOTA) methods by 14.28 points and 19.61% in success rate [5][24]. - The framework demonstrated superior performance in specific driving scenarios such as overtaking (71.11%), emergency braking (78.33%), and traffic sign recognition (69.15%) [26]. Key Contributions - The article highlights several key contributions of ORION: 1. QT-Former enhances the model's understanding of historical scenes by effectively aggregating long-term visual context [20]. 2. VLM enables multi-dimensional analysis of driving scenes, integrating user instructions and historical information for action reasoning [21]. 3. The generative model aligns the reasoning space of VLM with the action space for trajectory prediction, ensuring reasonable driving decisions in complex scenarios [22]. Conclusion - ORION provides a novel solution for E2E autonomous driving by achieving semantic and action space alignment, integrating long-term context aggregation, and jointly optimizing visual understanding and path planning tasks [30].
用QA问答详解端到端落地:[UniAD/PARA-Drive/SpareDrive/VADv2]
自动驾驶之心· 2025-08-29 16:03
Core Viewpoint - The article discusses various end-to-end models in autonomous driving, focusing on their architectures and functionalities, particularly the UniAD framework and its modular components for perception, prediction, and planning [4][13]. Group 1: End-to-End Models - End-to-end models are categorized into two types: completely black-box models like OneNet, which optimize the planner directly, and modular end-to-end models that reduce error accumulation through interactions between perception, prediction, and planning modules [3]. - The UniAD framework consists of four main parts: multi-view camera input, backbone for BEV feature extraction, perception for scene-level understanding, and prediction for multi-mode trajectory forecasting [4]. Group 2: Specific Model Architectures - TrackFormer utilizes three types of queries: detection, tracking, and ego queries, with a dynamic length for the tracking query set based on object disappearance [6]. - MotionFormer operates similarly to RNN structures, processing sequential blocks to predict future states based on previous outputs, focusing on agent-level knowledge [9]. - MapFormer employs Panoptic Segformer for environment segmentation, distinguishing between countable instances and uncountable elements [10]. Group 3: Advanced Techniques - PARA-Drive modifies the UniAD framework by adjusting the connections between perception, prediction, and planning modules, allowing for parallel training and improved inference speed [13]. - Symmetric sparse perception is divided into two parallel parts for agent detection and map perception, utilizing a DETR paradigm for both tasks [20]. - The planning transformer integrates various tokens to output action probabilities, selecting the most probable action based on human trajectory data [23]. Group 4: Community and Learning Resources - The article highlights the establishment of numerous technical discussion groups related to autonomous driving, covering over 30 learning paths and involving nearly 300 companies and research institutions [27][28].
死磕技术的自动驾驶全栈学习社区,近40+方向技术路线~
自动驾驶之心· 2025-08-27 01:26
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving enthusiasts, aiming to connect learners and professionals in the field, providing resources, networking opportunities, and industry insights. Group 1: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, serving as a hub for communication and technical sharing [1][12] - The community offers a variety of resources including video content, articles, learning paths, Q&A, and job exchange opportunities [1][2] - Nearly 40 technical routes have been organized within the community, catering to various interests such as industry applications and the latest benchmarks [2][5] Group 2: Learning and Development - The community provides structured learning paths for beginners, including full-stack courses suitable for those with no prior experience [7][9] - Members can access detailed information on end-to-end autonomous driving, multi-modal models, and various data sets for training and fine-tuning [3][26] - Regular discussions with industry leaders are held to explore trends, technological directions, and production challenges in autonomous driving [4][58] Group 3: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [9][11] - Members are encouraged to engage in discussions about career choices and research directions, receiving guidance from experienced professionals [55][60] - The platform aims to connect members with job openings and industry opportunities, enhancing their career prospects in the autonomous driving sector [1][62]
某新势力智驾团队最后一位留守高管已于近日离职
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The departure of key personnel from a leading new force car company's intelligent driving team may significantly impact its research and development progress, team stability, and sales momentum in the second half of the year [1][2][3]. Group 1: Company Developments - The intelligent driving team of the new force car company has experienced significant turnover, with a reported attrition rate exceeding 50% in some teams this year [1]. - The company has initiated a full-scale non-compete agreement to retain talent, even requiring recent graduates to sign such agreements [1]. - The departure of the R&D head, who was a core member of the team, raises concerns about the company's ability to achieve its ambitious goals for 2024 [2]. Group 2: Industry Trends - The movement of core intelligent driving talent across the industry may present new opportunities for technological advancements [3]. - The intelligent driving landscape is evolving, with a trend towards convergence in technology routes driven by competitive pricing strategies [3]. - The departure of key figures from various intelligent driving teams, including those from Xiaopeng and NIO, indicates a broader industry shift and a new cycle of updates within the intelligent driving teams [3]. Group 3: Strategic Implications - The company is expected to launch a new paradigm of intelligent driving, which could significantly influence the sales of new models [2]. - The loss of three high-level executives responsible for critical aspects of intelligent driving may disrupt the company's overall R&D timeline and stability [2].
面向量产VLA!FastDriveVLA:即插即用剪枝模块,推理加速近4倍
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel visual token pruning framework designed for autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [3][13][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of visual-language-action (VLA) models, which outperform traditional modular approaches in complex scene understanding and decision-making [3][10]. - The VLA model integrates perception, action generation, and planning into a single framework, reducing information loss between modules [3][4]. Group 2: Visual Token Pruning Techniques - Existing VLM/VLA models face high computational costs due to the encoding of images into numerous visual tokens, prompting research into visual token pruning methods [4][11]. - Two primary approaches for visual token pruning are attention mechanism-based methods and similarity-based methods, both of which have limitations in driving tasks [4][14]. - FastDriveVLA introduces a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground areas critical for driving decisions [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA employs a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to emphasize foreground information [6][17]. - The framework includes an adversarial foreground-background reconstruction strategy to enhance the model's ability to distinguish between foreground and background tokens [20][21]. - A large-scale dataset, nuScenes-FG, was constructed to support the training of ReconPruner, containing 241,000 image-mask pairs for effective foreground segmentation [6][12][13]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][28]. - The framework was evaluated under various pruning ratios (25%, 50%, 75%), consistently outperforming existing methods in key metrics such as L2 error and collision rates [30][34]. - The efficiency analysis showed that FastDriveVLA significantly reduces FLOPs and CUDA latency compared to other methods, enhancing real-time deployment capabilities [36][40]. Group 5: Contributions and Implications - The introduction of FastDriveVLA provides a new paradigm for efficient inference in VLA models, offering insights into task-specific token pruning strategies [43]. - The research highlights the importance of focusing on foreground information in autonomous driving tasks, which can lead to improved performance and reduced computational costs [5][43].
又帮到了一位同学拿到了自动驾驶算法岗......
自动驾驶之心· 2025-08-23 14:44
Core Viewpoint - The article emphasizes the importance of continuous learning and adaptation in the field of autonomous driving, particularly in light of industry shifts towards intelligent models and large models, while also highlighting the value of community support for knowledge sharing and job opportunities [1][2]. Group 1: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" is a comprehensive community platform that integrates video, text, learning paths, Q&A, and job exchange, aiming to grow from over 4,000 to nearly 10,000 members in two years [1][2]. - The community provides practical solutions for various topics such as entry points for end-to-end models, learning paths for multimodal large models, and engineering practices for data closed-loop 4D annotation [2][3]. - Members have access to over 40 technical routes, including industry applications, VLA benchmarks, and learning entry routes, significantly reducing search time for relevant information [2][3]. Group 2: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job applications and resume submissions directly to desired companies [7]. - Regular job sharing and updates on available positions are provided, creating a complete ecosystem for autonomous driving professionals [15][30]. Group 3: Technical Learning and Development - The community offers a well-structured technical stack and roadmap for beginners, covering essential areas such as mathematics, computer vision, deep learning, and programming [11][32]. - Various learning routes are available for advanced topics, including end-to-end autonomous driving, 3DGS principles, and multimodal large models, catering to both newcomers and experienced professionals [16][34][40]. - The platform also hosts live sessions with industry leaders, providing insights into cutting-edge research and practical applications in autonomous driving [58][66].
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]
端到端全新范式!复旦VeteranAD:"感知即规划"刷新开闭环SOTA,超越DiffusionDrive~
自动驾驶之心· 2025-08-21 23:34
Core Insights - The article introduces a novel "perception-in-plan" paradigm for end-to-end autonomous driving, implemented in the VeteranAD framework, which integrates perception directly into the planning process, enhancing the effectiveness of planning optimization [5][39]. - VeteranAD demonstrates superior performance on challenging benchmarks, NAVSIM and Bench2Drive, showcasing the benefits of tightly coupling perception and planning for improved accuracy and safety in autonomous driving [12][39]. Summary by Sections Introduction - The article discusses significant advancements in end-to-end autonomous driving, emphasizing the need to unify multiple tasks within a single framework to prevent information loss across stages [2][3]. Proposed Framework - VeteranAD framework is designed to embed perception into planning, allowing the perception module to operate more effectively in alignment with planning needs [5][6]. - The framework consists of two core modules: Planning-Aware Holistic Perception and Localized Autoregressive Trajectory Planning, which work together to enhance the performance of end-to-end planning tasks [12][39]. Core Modules - **Planning-Aware Holistic Perception**: This module interacts across three dimensions—image features, BEV features, and surrounding traffic features—to achieve a comprehensive understanding of traffic elements [6]. - **Localized Autoregressive Trajectory Planning**: This module generates future trajectories in an autoregressive manner, progressively refining the planned trajectory based on perception results [6][16]. Experimental Results - VeteranAD achieved a PDM Score of 90.2 on the NAVSIM navtest dataset, outperforming previous learning methods and demonstrating its effectiveness in end-to-end planning [21]. - In open-loop evaluations, VeteranAD recorded an average L2 error of 0.60, surpassing all baseline methods, while maintaining competitive performance in closed-loop evaluations [25][33]. Ablation Studies - Ablation studies indicate that the use of guiding points from anchored trajectories is crucial for accurate planning, as removing these points significantly degrades performance [26]. - The combination of both core modules results in enhanced performance, highlighting their complementary nature [26]. Conclusion - The article concludes that the "perception-in-plan" design significantly improves end-to-end planning accuracy and safety, paving the way for future research in more efficient and reliable autonomous driving systems [39].
没有高效的技术和行业信息渠道,很多时间浪费了。。。
自动驾驶之心· 2025-08-21 23:34
Core Insights - The article emphasizes the importance of efficient information collection channels for individuals seeking to transition into the autonomous driving industry, highlighting the establishment of a comprehensive community that integrates academic content, industry discussions, open-source solutions, and job opportunities [1][3]. Group 1: Community and Resources - The community serves as a platform for cultivating future leaders in the autonomous driving field, providing a space for academic and engineering discussions [3]. - The community has grown to over 4,000 members, offering a blend of video content, articles, learning paths, Q&A, and job exchange [1][3]. - A variety of resources are available, including a complete entry-level technical stack and roadmap for newcomers, as well as valuable industry frameworks and project proposals for those already engaged in research [9][11]. Group 2: Learning and Development - The community has compiled extensive resources, including over 40 open-source projects and nearly 60 datasets related to autonomous driving, along with mainstream simulation platforms and various technical learning paths [16]. - Specific learning routes are available for different aspects of autonomous driving, such as perception, simulation, and planning control, catering to both beginners and advanced practitioners [17][16]. - The community also offers a series of video tutorials covering topics like sensor calibration, SLAM, decision-making, and trajectory prediction [5]. Group 3: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [5]. - Continuous job sharing and position updates are provided, creating a complete ecosystem for autonomous driving professionals [13]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced peers [82].
公司通知团队缩减,懂端到端的留下来了。。。
自动驾驶之心· 2025-08-19 23:32
Core Viewpoint - The article discusses the rapid evolution and challenges in the field of end-to-end autonomous driving technology, emphasizing the need for a comprehensive understanding of various algorithms and models to succeed in this competitive industry [2][4][6]. Group 1: Industry Trends - The shift from modular approaches to end-to-end systems in autonomous driving aims to eliminate cumulative errors between modules, marking a significant technological leap [2]. - The emergence of various algorithms and models, such as UniAD and BEV perception, indicates a growing focus on integrating multiple tasks into a unified framework [4][9]. - The demand for knowledge in multi-modal large models, reinforcement learning, and diffusion models is increasing, reflecting the industry's need for versatile skill sets [5][20]. Group 2: Learning Challenges - New entrants face difficulties due to the fragmented nature of knowledge and the overwhelming volume of research papers in the field, often leading to early abandonment of learning [5][6]. - The lack of high-quality documentation and practical guidance further complicates the transition from theory to practice in end-to-end autonomous driving research [5][6]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges, focusing on practical applications and theoretical foundations [6][24]. - The course is structured to provide a comprehensive understanding of end-to-end algorithms, including their historical development and current trends [11][12]. - Practical components, such as real-world projects and assignments, are included to ensure that participants can apply their knowledge effectively [8][21]. Group 4: Course Content Overview - The course covers various topics, including the introduction to end-to-end algorithms, background knowledge on relevant technologies, and detailed explorations of both one-stage and two-stage end-to-end methods [11][12][13]. - Specific chapters focus on advanced topics like world models and diffusion models, which are crucial for understanding the latest advancements in autonomous driving [15][17][20]. - The final project involves practical applications of reinforcement learning from human feedback (RLHF), allowing participants to gain hands-on experience [21].