Workflow
端到端自动驾驶
icon
Search documents
面向量产VLA!FastDriveVLA:即插即用剪枝模块,推理加速近4倍
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel visual token pruning framework designed for autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [3][13][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of visual-language-action (VLA) models, which outperform traditional modular approaches in complex scene understanding and decision-making [3][10]. - The VLA model integrates perception, action generation, and planning into a single framework, reducing information loss between modules [3][4]. Group 2: Visual Token Pruning Techniques - Existing VLM/VLA models face high computational costs due to the encoding of images into numerous visual tokens, prompting research into visual token pruning methods [4][11]. - Two primary approaches for visual token pruning are attention mechanism-based methods and similarity-based methods, both of which have limitations in driving tasks [4][14]. - FastDriveVLA introduces a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground areas critical for driving decisions [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA employs a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to emphasize foreground information [6][17]. - The framework includes an adversarial foreground-background reconstruction strategy to enhance the model's ability to distinguish between foreground and background tokens [20][21]. - A large-scale dataset, nuScenes-FG, was constructed to support the training of ReconPruner, containing 241,000 image-mask pairs for effective foreground segmentation [6][12][13]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][28]. - The framework was evaluated under various pruning ratios (25%, 50%, 75%), consistently outperforming existing methods in key metrics such as L2 error and collision rates [30][34]. - The efficiency analysis showed that FastDriveVLA significantly reduces FLOPs and CUDA latency compared to other methods, enhancing real-time deployment capabilities [36][40]. Group 5: Contributions and Implications - The introduction of FastDriveVLA provides a new paradigm for efficient inference in VLA models, offering insights into task-specific token pruning strategies [43]. - The research highlights the importance of focusing on foreground information in autonomous driving tasks, which can lead to improved performance and reduced computational costs [5][43].
又帮到了一位同学拿到了自动驾驶算法岗......
自动驾驶之心· 2025-08-23 14:44
Core Viewpoint - The article emphasizes the importance of continuous learning and adaptation in the field of autonomous driving, particularly in light of industry shifts towards intelligent models and large models, while also highlighting the value of community support for knowledge sharing and job opportunities [1][2]. Group 1: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" is a comprehensive community platform that integrates video, text, learning paths, Q&A, and job exchange, aiming to grow from over 4,000 to nearly 10,000 members in two years [1][2]. - The community provides practical solutions for various topics such as entry points for end-to-end models, learning paths for multimodal large models, and engineering practices for data closed-loop 4D annotation [2][3]. - Members have access to over 40 technical routes, including industry applications, VLA benchmarks, and learning entry routes, significantly reducing search time for relevant information [2][3]. Group 2: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job applications and resume submissions directly to desired companies [7]. - Regular job sharing and updates on available positions are provided, creating a complete ecosystem for autonomous driving professionals [15][30]. Group 3: Technical Learning and Development - The community offers a well-structured technical stack and roadmap for beginners, covering essential areas such as mathematics, computer vision, deep learning, and programming [11][32]. - Various learning routes are available for advanced topics, including end-to-end autonomous driving, 3DGS principles, and multimodal large models, catering to both newcomers and experienced professionals [16][34][40]. - The platform also hosts live sessions with industry leaders, providing insights into cutting-edge research and practical applications in autonomous driving [58][66].
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]
端到端全新范式!复旦VeteranAD:"感知即规划"刷新开闭环SOTA,超越DiffusionDrive~
自动驾驶之心· 2025-08-21 23:34
Core Insights - The article introduces a novel "perception-in-plan" paradigm for end-to-end autonomous driving, implemented in the VeteranAD framework, which integrates perception directly into the planning process, enhancing the effectiveness of planning optimization [5][39]. - VeteranAD demonstrates superior performance on challenging benchmarks, NAVSIM and Bench2Drive, showcasing the benefits of tightly coupling perception and planning for improved accuracy and safety in autonomous driving [12][39]. Summary by Sections Introduction - The article discusses significant advancements in end-to-end autonomous driving, emphasizing the need to unify multiple tasks within a single framework to prevent information loss across stages [2][3]. Proposed Framework - VeteranAD framework is designed to embed perception into planning, allowing the perception module to operate more effectively in alignment with planning needs [5][6]. - The framework consists of two core modules: Planning-Aware Holistic Perception and Localized Autoregressive Trajectory Planning, which work together to enhance the performance of end-to-end planning tasks [12][39]. Core Modules - **Planning-Aware Holistic Perception**: This module interacts across three dimensions—image features, BEV features, and surrounding traffic features—to achieve a comprehensive understanding of traffic elements [6]. - **Localized Autoregressive Trajectory Planning**: This module generates future trajectories in an autoregressive manner, progressively refining the planned trajectory based on perception results [6][16]. Experimental Results - VeteranAD achieved a PDM Score of 90.2 on the NAVSIM navtest dataset, outperforming previous learning methods and demonstrating its effectiveness in end-to-end planning [21]. - In open-loop evaluations, VeteranAD recorded an average L2 error of 0.60, surpassing all baseline methods, while maintaining competitive performance in closed-loop evaluations [25][33]. Ablation Studies - Ablation studies indicate that the use of guiding points from anchored trajectories is crucial for accurate planning, as removing these points significantly degrades performance [26]. - The combination of both core modules results in enhanced performance, highlighting their complementary nature [26]. Conclusion - The article concludes that the "perception-in-plan" design significantly improves end-to-end planning accuracy and safety, paving the way for future research in more efficient and reliable autonomous driving systems [39].
没有高效的技术和行业信息渠道,很多时间浪费了。。。
自动驾驶之心· 2025-08-21 23:34
Core Insights - The article emphasizes the importance of efficient information collection channels for individuals seeking to transition into the autonomous driving industry, highlighting the establishment of a comprehensive community that integrates academic content, industry discussions, open-source solutions, and job opportunities [1][3]. Group 1: Community and Resources - The community serves as a platform for cultivating future leaders in the autonomous driving field, providing a space for academic and engineering discussions [3]. - The community has grown to over 4,000 members, offering a blend of video content, articles, learning paths, Q&A, and job exchange [1][3]. - A variety of resources are available, including a complete entry-level technical stack and roadmap for newcomers, as well as valuable industry frameworks and project proposals for those already engaged in research [9][11]. Group 2: Learning and Development - The community has compiled extensive resources, including over 40 open-source projects and nearly 60 datasets related to autonomous driving, along with mainstream simulation platforms and various technical learning paths [16]. - Specific learning routes are available for different aspects of autonomous driving, such as perception, simulation, and planning control, catering to both beginners and advanced practitioners [17][16]. - The community also offers a series of video tutorials covering topics like sensor calibration, SLAM, decision-making, and trajectory prediction [5]. Group 3: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [5]. - Continuous job sharing and position updates are provided, creating a complete ecosystem for autonomous driving professionals [13]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced peers [82].
公司通知团队缩减,懂端到端的留下来了。。。
自动驾驶之心· 2025-08-19 23:32
Core Viewpoint - The article discusses the rapid evolution and challenges in the field of end-to-end autonomous driving technology, emphasizing the need for a comprehensive understanding of various algorithms and models to succeed in this competitive industry [2][4][6]. Group 1: Industry Trends - The shift from modular approaches to end-to-end systems in autonomous driving aims to eliminate cumulative errors between modules, marking a significant technological leap [2]. - The emergence of various algorithms and models, such as UniAD and BEV perception, indicates a growing focus on integrating multiple tasks into a unified framework [4][9]. - The demand for knowledge in multi-modal large models, reinforcement learning, and diffusion models is increasing, reflecting the industry's need for versatile skill sets [5][20]. Group 2: Learning Challenges - New entrants face difficulties due to the fragmented nature of knowledge and the overwhelming volume of research papers in the field, often leading to early abandonment of learning [5][6]. - The lack of high-quality documentation and practical guidance further complicates the transition from theory to practice in end-to-end autonomous driving research [5][6]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges, focusing on practical applications and theoretical foundations [6][24]. - The course is structured to provide a comprehensive understanding of end-to-end algorithms, including their historical development and current trends [11][12]. - Practical components, such as real-world projects and assignments, are included to ensure that participants can apply their knowledge effectively [8][21]. Group 4: Course Content Overview - The course covers various topics, including the introduction to end-to-end algorithms, background knowledge on relevant technologies, and detailed explorations of both one-stage and two-stage end-to-end methods [11][12][13]. - Specific chapters focus on advanced topics like world models and diffusion models, which are crucial for understanding the latest advancements in autonomous driving [15][17][20]. - The final project involves practical applications of reinforcement learning from human feedback (RLHF), allowing participants to gain hands-on experience [21].
端到端VLA的起点:聊聊大语言模型和CLIP~
自动驾驶之心· 2025-08-19 07:20
Core Viewpoint - The article discusses the development and significance of end-to-end (E2E) algorithms in autonomous driving, emphasizing the integration of various advanced technologies such as large language models (LLMs), diffusion models, and reinforcement learning (RL) in enhancing the capabilities of autonomous systems [21][31]. Summary by Sections Section 1: Overview of End-to-End Autonomous Driving - The first chapter provides a comprehensive overview of the evolution of end-to-end algorithms, explaining the transition from modular approaches to end-to-end solutions, and discussing the advantages and challenges of different paradigms [40]. Section 2: Background Knowledge - The second chapter focuses on the technical stack associated with end-to-end systems, detailing the importance of LLMs, diffusion models, and reinforcement learning, which are crucial for understanding the future job market in this field [41][42]. Section 3: Two-Stage End-to-End Systems - The third chapter delves into two-stage end-to-end systems, exploring their emergence, advantages, and disadvantages, while also reviewing notable works in the field such as PLUTO and CarPlanner [42][43]. Section 4: One-Stage End-to-End and VLA - The fourth chapter highlights one-stage end-to-end systems, discussing various subfields including perception-based methods and the latest advancements in VLA (Vision-Language Alignment), which are pivotal for achieving the ultimate goals of autonomous driving [44][50]. Section 5: Practical Application and RLHF Fine-Tuning - The fifth chapter includes a major project focused on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing practical insights into building pre-training and reinforcement learning modules, which are applicable to VLA-related algorithms [52]. Course Structure and Learning Outcomes - The course aims to equip participants with a solid understanding of end-to-end autonomous driving technologies, covering essential frameworks and methodologies, and preparing them for roles in the industry [56][57].
全面超越DiffusionDrive, GMF-Drive:全球首个Mamba端到端SOTA方案
理想TOP2· 2025-08-18 12:43
Core Insights - The article discusses the advancements in end-to-end autonomous driving, emphasizing the importance of multi-modal fusion architectures and the introduction of GMF-Drive as a new framework that improves upon existing methods [3][4][44]. Group 1: End-to-End Autonomous Driving - End-to-end autonomous driving has gained widespread acceptance as it directly maps raw sensor inputs to driving actions, reducing reliance on intermediate representations and information loss [3]. - Recent models like DiffusionDrive and GoalFlow demonstrate strong capabilities in generating diverse and high-quality driving trajectories [3]. Group 2: Multi-Modal Fusion Challenges - A key bottleneck in current systems is the integration of heterogeneous inputs from different sensors, with existing methods often relying on simple feature concatenation rather than structured information integration [4][6]. - The article highlights that current multi-modal fusion architectures, such as TransFuser, show limited performance improvements compared to single-modal architectures, indicating a need for more sophisticated integration methods [6]. Group 3: GMF-Drive Overview - GMF-Drive, developed by teams from University of Science and Technology of China and China University of Mining and Technology, includes three modules aimed at enhancing multi-modal fusion for autonomous driving [7]. - The framework combines a gated Mamba fusion approach with spatial-aware BEV representation, addressing the limitations of traditional transformer-based methods [7][44]. Group 4: Innovations in Data Representation - The article introduces a 14-dimensional pillar representation that retains critical 3D geometric features, enhancing the model's perception capabilities [16][19]. - This representation captures local surface geometry and height variations, allowing the model to differentiate between objects with similar point densities but different structures [19]. Group 5: GM-Fusion Module - The GM-Fusion module integrates multi-modal features through gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, achieving linear complexity while maintaining long-range dependency modeling [19][20]. - The module's design allows for effective spatial dependency modeling and improved feature alignment between camera and LiDAR data [19][40]. Group 6: Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM benchmark, outperforming the previous best model, DiffusionDrive, by 0.8 points, demonstrating the effectiveness of the GM-Fusion architecture [29][30]. - The framework also showed significant improvements in key sub-metrics, such as driving area compliance and vehicle progression rate, indicating enhanced safety and efficiency [30][31]. Group 7: Conclusion - The article concludes that GMF-Drive represents a significant advancement in autonomous driving frameworks by effectively combining geometric representations with spatially aware fusion techniques, achieving new performance benchmarks [44].
“黑羊”绝影:如何给车企铺AI路?
Group 1 - The core viewpoint is that SenseTime's automotive division, Jueying, has the potential to succeed after addressing key challenges in the automotive industry, with plans to expand partnerships with car manufacturers by 2025 [1] - SenseTime has invested seven years in developing AI technology, aiming to validate its value in the automotive sector [1] - Jueying plans to develop advanced end-to-end solutions based on NVIDIA's Thor platform, indicating a strategic move towards higher-level AI applications in vehicles [1] Group 2 - CEO Wang Xiaogang of Jueying was among the first to identify opportunities in the end-to-end field, having collaborated with Honda on an L4 autonomous driving project in 2017 [2] - The project faced challenges due to computational limitations and industry awareness, which delayed its implementation [2] - Following the production of Tesla's FSD V12, Jueying is accelerating its efforts to catch up, with plans to showcase its UniAD end-to-end deployment at the 2024 Beijing Auto Show [2] - A joint development of an end-to-end autonomous driving system with Dongfeng Motor is set to be realized by the end of this year [2]
多空博弈Robotaxi:“木头姐”建仓,机构现分歧
Di Yi Cai Jing· 2025-08-15 03:45
唱多、唱空交织,推动自动驾驶技术成熟。 今年以来,Robotaxi(自动驾驶出租车)受到全球资本市场广泛关注,但质疑声也如约而至。 近日,"木头姐"Cathie Wood旗下ARK基金斥资约1290万美元买入小马智行(NASDAQ:PONY)股 票,这是"木头姐"的主力基金首次持仓中国自动驾驶标的。据悉,"木头姐"被华尔街认为是"女版巴菲 特",其投资偏好是高成长、高风险及长期持有。 另一家中国Robotaxi头部企业文远知行(NASDAQ:WRD)二季度Robotaxi业务同比大增836.7%,该公 司早在今年5月就披露了Uber承诺向其追加投资1亿美元的事宜。 记者近期在广州体验百度旗下萝卜快跑Robotaxi时也出现"高峰期等车时间长达1个小时、且无车接 单"的情况。当记者问询叫车点附近运营车辆数量时,萝卜快跑客服回应称:"城市的可服务车辆并非固 定不变,会受多方因素影响进行动态调整。"根据附近居民、商户的反馈,下班高峰期萝卜快跑的等车 时长大于40分钟。 不可否认的是,现阶段Robotaxi派单时长、等车时长均较有人网约车更多,也是行业需要解决的课题。 韩旭表示,当自动驾驶公司开拓一个新城市时,自动驾 ...