端到端自动驾驶 - filings, earnings calls, financial reports, news - Reportify

端到端自动驾驶

Search documents

面向量产VLA方案！FastDriveVLA：即插即用剪枝模块，推理加速近4倍（北大&小鹏）

自动驾驶之心· 2025-08-04 23:33

Core Viewpoint - The article discusses the development of FastDriveVLA, a novel framework for visual token pruning in autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [2][3][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of end-to-end methods that complete perception to planning in a single model, reducing information loss between modules [3]. - The introduction of Visual-Language-Action (VLA) models enhances decision-making in complex scenarios, making them increasingly popular in autonomous driving systems [3][10]. Group 2: Visual Token Pruning - Existing VLM/VLA models encode images into numerous visual tokens, resulting in high computational costs. Current research explores two main directions for visual token pruning: attention mechanism-based methods and similarity-based methods [4][14]. - FastDriveVLA proposes a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground information, significantly reducing computational costs while maintaining performance [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA includes a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to focus on foreground areas and assign higher significance scores to key tokens [6][17]. - The framework utilizes a large-scale dataset, nuScenes-FG, containing 241,000 image-mask pairs for training, enhancing the model's ability to distinguish between foreground and background [6][12]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][34]. - The framework shows superior performance compared to existing methods, with improvements in L2 error and collision rates at various pruning ratios [30][34]. Group 5: Efficiency Analysis - FastDriveVLA significantly reduces FLOPs by approximately 7.5 times and decreases prefill and decode latencies, enhancing inference efficiency for real-time deployment [36][40]. - The lightweight design of ReconPruner allows for lower CUDA latency compared to several similar methods, making it suitable for practical applications [36][40].

视觉-语言-动作（VLA）模型

视觉-语言大模型（VLM）

端到端自动驾驶

视觉-语言-动作（VLA）模型

视觉-语言大模型（VLM）

端到端自动驾驶

性能暴涨30%！港中文ReAL-AD：类人推理的端到端算法 (ICCV'25)

自动驾驶之心· 2025-08-03 23:32

Core Viewpoint - The article discusses the ReAL-AD framework, which integrates human-like reasoning into end-to-end autonomous driving systems, enhancing decision-making processes through a structured approach that mimics human cognitive functions [3][43]. Group 1: Framework Overview - ReAL-AD employs a reasoning-enhanced learning framework based on a three-layer human cognitive model: driving strategy, decision-making, and operation [3][5]. - The framework incorporates a visual-language model (VLM) to improve environmental perception and structured reasoning capabilities, allowing for a more nuanced decision-making process [3][5]. Group 2: Components of ReAL-AD - The framework consists of three main components: 1. **Strategic Reasoning Injector**: Utilizes VLM to generate insights for complex traffic situations, forming high-level driving strategies [5][11]. 2. **Tactical Reasoning Integrator**: Converts strategic intentions into executable tactical choices, bridging the gap between strategy and operational decisions [5][14]. 3. **Hierarchical Trajectory Decoder**: Simulates human decision-making by establishing rough motion patterns before refining them into detailed trajectories [5][20]. Group 3: Performance Evaluation - In open-loop evaluations, ReAL-AD demonstrated significant improvements over baseline methods, achieving over 30% better performance in L2 error and collision rates [36]. - The framework achieved the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the nuScenes dataset, indicating enhanced learning efficiency in driving capabilities [36]. - Closed-loop evaluations showed that the introduction of the ReAL-AD framework significantly improved driving scores and successful path completions compared to baseline models [37]. Group 4: Experimental Setup - The evaluation utilized the nuScenes dataset, which includes 1,000 scenes sampled at 2Hz, and the Bench2Drive dataset, covering 44 scenarios and 23 weather conditions [34]. - Metrics for evaluation included L2 error, collision rates, driving scores, and success rates, providing a comprehensive assessment of the framework's performance [35][39]. Group 5: Ablation Studies - Ablation studies indicated that removing the Strategic Reasoning Injector led to a 12% increase in average L2 error and a 19% increase in collision rates, highlighting its importance in guiding decision-making [40]. - The Tactical Reasoning Integrator was shown to reduce average L2 error by 0.14 meters and collision rates by 0.05%, emphasizing the value of tactical commands in planning [41]. - Replacing the Hierarchical Trajectory Decoder with a multi-layer perceptron resulted in increased L2 error and collision rates, underscoring the necessity of a hierarchical decoding process for trajectory prediction [41].

视觉语言模型

端到端自动驾驶

视觉语言模型

端到端自动驾驶

开课倒计时！国内首个自动驾驶端到端项目级教程来啦~

自动驾驶之心· 2025-08-02 06:00

Core Viewpoint - End-to-end (E2E) autonomous driving is currently the core algorithm for mass production in intelligent driving, with significant advancements in the VLM/VLA systems leading to high demand for related positions and salaries reaching up to 1 million annually [2][11]. Group 1: Industry Trends - The concept of E2E has evolved significantly, with various technical schools emerging, yet many still struggle to understand its workings and distinctions between single-stage and two-stage approaches [2][4]. - The introduction of VLA (Vision-Language Architecture) is seen as a new frontier in autonomous driving, with companies actively researching and developing new generation mass production solutions [21][22]. Group 2: Educational Initiatives - A new course titled "End-to-End and VLA Autonomous Driving" has been launched to address the challenges faced by newcomers in the field, focusing on practical applications and theoretical foundations [14][27]. - The course aims to provide a comprehensive understanding of E2E autonomous driving, covering various models and methodologies, including diffusion models and reinforcement learning [6][19][21]. Group 3: Job Market Insights - The job market for VLA/VLM algorithm experts is robust, with salaries for positions requiring 3-5 years of experience ranging from 40K to 70K monthly, indicating a strong demand for skilled professionals [11][12]. - Positions such as VLA model quantization deployment engineers and multi-modal VLA model direction experts are particularly sought after, reflecting the industry's shift towards advanced algorithmic solutions [11][12].

端到端自动驾驶

端到端与VLA自动驾驶小班课

端到端自动驾驶

端到端与VLA自动驾驶小班课

理想发布会三小时，最狠的是：VLA 要上路了？！

自动驾驶之心· 2025-07-30 03:01

Core Viewpoint - The article discusses the launch of the Li Auto i8, highlighting its significant upgrades in assisted driving features and the introduction of the VLA (Vision-Language-Action) model, marking a milestone in the development of end-to-end autonomous driving technology [2][4]. Summary by Sections VLA Model Capabilities - The VLA model enhances three main capabilities: better semantic understanding (multimodal input), improved reasoning (thinking chains), and closer alignment with human driving intuition. It focuses on four core abilities: spatial understanding, reasoning ability, communication and memory ability, and behavioral ability [4][5]. Industry Outlook - The demand for VLA/VLM model algorithm experts is projected to be high, with salaries ranging from 40K to 70K for those with 3-5 years of experience and a master's degree. Top technical talents, especially PhD graduates, can expect salaries between 90K to 120K [13]. Learning Challenges - The article outlines the challenges faced by newcomers in the field of end-to-end autonomous driving, including the complexity of the technology stack and the fragmented nature of available knowledge. It emphasizes the need for a structured learning path to navigate the vast amount of literature and practical applications [16][17]. Course Introduction - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges. The course aims to provide a quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [17][18][19]. Course Structure - The course consists of several chapters covering the history and development of end-to-end autonomous driving, background knowledge on relevant technologies, and detailed discussions on various paradigms such as one-stage and two-stage end-to-end methods. It also includes practical assignments to reinforce learning [23][24][25][26]. Expected Outcomes - Upon completion of the course, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, with a solid understanding of key technologies and the ability to apply learned concepts to real-world projects [33].

端到端自动驾驶

VLA（视觉 - 语言 - 动作模型）

端到端自动驾驶

VLA（视觉 - 语言 - 动作模型）

从端到端到VLA，自动驾驶量产开始往这个方向发展...

自动驾驶之心· 2025-07-26 13:30

Core Viewpoint - End-to-end (E2E) autonomous driving is currently the core algorithm for mass production in the intelligent driving sector, with significant advancements in VLM (Vision-Language Model) and VLA (Vision-Language Architecture) systems driving the industry forward [2][3]. Group 1: Industry Trends - The E2E approach has become a competitive focus for domestic new energy vehicle manufacturers, with the emergence of VLA concepts leading to a new wave of production scheme iterations [2]. - Salaries for positions related to VLM/VLA are reported to reach up to one million annually, with monthly salaries around 70K [2]. - The rapid development of technology has made previous solutions inadequate, necessitating a comprehensive understanding of various technical fields such as multimodal large models, BEV perception, reinforcement learning, and diffusion models [3][4]. Group 2: Educational Initiatives - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the challenges faced by learners in this complex field, focusing on practical applications and theoretical foundations [4][5][6]. - The course aims to provide a structured learning path, helping students build a framework for research and enhance their research capabilities by categorizing papers and extracting innovative points [5]. - Practical components are included to ensure a complete learning loop from theory to application, addressing the gap between academic knowledge and real-world implementation [6]. Group 3: Course Structure - The course is divided into several chapters, covering topics such as the history and evolution of E2E algorithms, background knowledge on relevant technologies, and detailed explorations of both one-stage and two-stage E2E methods [9][10][11]. - Key areas of focus include the introduction of various E2E paradigms, the significance of world models, and the application of diffusion models in trajectory prediction [11][12]. - The final chapter includes a major project on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, allowing students to apply their knowledge in practical scenarios [13]. Group 4: Target Audience and Outcomes - The course is designed for individuals with a foundational understanding of autonomous driving and related technologies, aiming to elevate their expertise to a level comparable to that of an E2E autonomous driving algorithm engineer within a year [20]. - Participants will gain a comprehensive understanding of E2E frameworks, including one-stage, two-stage, world models, and diffusion models, as well as deeper insights into key technologies like BEV perception and multimodal large models [20].

端到端自动驾驶

端到端与VLA自动驾驶小班课

端到端自动驾驶

端到端与VLA自动驾驶小班课

端到端自动驾驶万字长文总结

自动驾驶之心· 2025-07-23 09:56

Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].

端到端自动驾驶

多模态规划

Autonomous Driving

端到端自动驾驶

多模态规划

Autonomous Driving

同济大学最新！GEMINUS：端到端MoE实现闭环新SOTA，性能大涨近8%~

自动驾驶之心· 2025-07-22 12:46

Core Viewpoint - The article presents GEMINUS, a novel end-to-end autonomous driving framework that integrates a dual-aware mixture of experts (MoE) architecture, achieving state-of-the-art performance in driving score and success rate using monocular vision input [1][2][49]. Summary by Sections Introduction - GEMINUS addresses the limitations of traditional single-modal planning methods in autonomous driving by introducing a framework that combines a global expert and a scene-adaptive experts group, along with a dual-aware router to enhance adaptability and robustness in diverse driving scenarios [1][6]. Background - The article discusses the evolution of end-to-end autonomous driving systems, highlighting the shift from modular approaches to unified models that directly map sensor inputs to control signals, thus reducing engineering workload and leveraging rich sensor information [4][8]. MoE Architecture - The MoE architecture has shown promise in handling complex data distributions, providing fine-grained scene adaptability and specialized behavior generation, which helps mitigate the mode averaging problem prevalent in existing models [5][11]. GEMINUS Framework - GEMINUS consists of a global expert trained on the overall dataset for robust performance and scene-adaptive experts trained on specific scene subsets for adaptability. The dual-aware router dynamically activates the appropriate expert based on scene features and routing uncertainty [6][18]. Experimental Results - GEMINUS outperformed existing methods in the Bench2Drive closed-loop benchmark, achieving a driving score improvement of 7.67% and a success rate increase of 22.06% compared to the original single-expert baseline model [2][36][49]. Ablation Studies - The ablation studies revealed that the scene-aware routing mechanism significantly enhances model performance, while the integration of uncertainty-aware routing and global experts further improves robustness and stability in ambiguous scenarios [40][41]. Conclusion - GEMINUS demonstrates a significant advancement in end-to-end autonomous driving, achieving state-of-the-art performance with monocular vision input and highlighting the importance of tailored MoE frameworks to address the complexities of real-world driving scenarios [49][50].

端到端自动驾驶

专家混合模型

端到端自动驾驶

专家混合模型

可以留意一下10位业内人士如何看VLA

理想TOP2· 2025-07-21 14:36

Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]

端到端自动驾驶

端到端自动驾驶

70K？端到端VLA现在这么吃香！？

自动驾驶之心· 2025-07-21 11:18

Core Viewpoint - End-to-end (E2E) autonomous driving is currently the core algorithm for mass production in intelligent driving, with significant advancements in the VLA (Vision-Language Architecture) and VLM (Vision-Language Model) systems, leading to high demand for related positions in the industry [2][4]. Summary by Sections Section 1: Background Knowledge - The course aims to provide a comprehensive understanding of end-to-end autonomous driving, including its historical development and the transition from modular to end-to-end approaches [21]. - Key technical stacks such as VLA, diffusion models, and reinforcement learning are essential for understanding the current landscape of autonomous driving technology [22]. Section 2: Job Market Insights - Positions related to VLA/VLM algorithms offer lucrative salaries, with 3-5 years of experience earning between 40K to 70K monthly, and top talents in the field can earn up to 1 million annually [10]. - The demand for VLA-related roles is increasing, indicating a shift in the industry towards advanced model architectures [9]. Section 3: Course Structure - The course is structured into five chapters, covering topics from basic concepts of end-to-end algorithms to advanced applications in VLA and reinforcement learning [19][30]. - Practical components are included to bridge the gap between theory and application, ensuring participants can implement learned concepts in real-world scenarios [18]. Section 4: Technical Innovations - Various approaches within end-to-end frameworks are explored, including two-stage and one-stage methods, with notable models like PLUTO and UniAD leading the way [4][23]. - The introduction of diffusion models has revolutionized trajectory prediction, allowing for better adaptability in uncertain driving environments [24]. Section 5: Learning Outcomes - Participants are expected to achieve a level of proficiency equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering key technologies and frameworks [32]. - The course emphasizes the importance of understanding BEV perception, multimodal models, and reinforcement learning to stay competitive in the evolving job market [32].

端到端自动驾驶

端到端自动驾驶算法

《端到端与VLA自动驾驶小班课》

端到端自动驾驶

端到端自动驾驶算法

《端到端与VLA自动驾驶小班课》

还不知道研究方向？别人已经在卷VLA了......

自动驾驶之心· 2025-07-21 05:18

Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, which present new opportunities for innovation and research in the field [1][2]. Group 1: VLA Research Topics - The VLA model aims to create an end-to-end autonomous driving system that maps raw sensor inputs directly to driving control commands, moving away from traditional modular architectures [2]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models enhance interpretability and reliability by allowing the system to explain its decision-making process in natural language, thus improving human trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - It includes a structured learning experience with a combination of online group research, paper guidance, and maintenance periods to ensure comprehensive understanding and application [6][8]. - Participants will gain insights into classic and cutting-edge papers, coding practices, and effective writing and submission strategies for academic papers [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and autonomous driving algorithms [5][9]. - Basic requirements include familiarity with Python and PyTorch, as well as access to high-performance computing resources [13][14]. - The course emphasizes academic integrity and provides a structured environment for learning and research [14][19]. Group 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [14]. - It is designed to ensure high academic standards and facilitate significant project outcomes, including a draft paper and project completion certificate [14][20]. - The course also includes a feedback mechanism to optimize the learning experience based on individual progress [14].

视觉 - 语言 - 行为（VLA）

端到端自动驾驶

大型语言模型（LLM）

大型多模态模型（LMM）

视觉 - 语言 - 行为（VLA）

端到端自动驾驶

大型语言模型（LLM）

大型多模态模型（LMM）