Workflow
自动驾驶之心
icon
Search documents
建了个自动驾驶VLA技术交流群(数据/模型/部署等方向)
自动驾驶之心· 2025-08-05 11:22
感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称+VLA加群。 自动驾驶之心VLA技术交流群成立了,欢迎大家加入一起交流VLA相关的内容:包括VLA数据集制作、一 段式VLA、分层VLA、基于大模型的端到端方案、基于VLM+DP的方案、量产落地、求职等内容。 ...
自动驾驶论文速递 | 扩散模型、轨迹预测、TopoLiDM、VLA等~
自动驾驶之心· 2025-08-05 03:09
Core Insights - The article discusses advancements in trajectory prediction using a generative active learning framework called GALTraj, which applies controllable diffusion models to address long-tail issues in data [1][2]. Group 1: GALTraj Framework - GALTraj is the first framework to apply generative active learning to trajectory prediction tasks, enhancing long-tail learning without modifying the model structure [2]. - The framework employs a tail-aware generation method that differentiates the diffusion guidance for tail, head, and related agents, producing realistic and diverse scenarios while preserving tail characteristics [2][3]. Group 2: Experimental Results - In experiments on WOMD and Argoverse2 datasets, GALTraj significantly improved long-tail sample prediction performance, reducing the long-tail metric FPR₅ by 47.6% (from 0.42 to 0.22) and overall prediction error minFDE₆ by 14.7% (from 0.654 to 0.558) [1][6]. - The results indicate that GALTraj outperforms traditional methods across various metrics, showcasing its effectiveness in enhancing prediction accuracy for rare scenarios [7][8]. Group 3: TopoLiDM Framework - The article also highlights the TopoLiDM framework developed by Shanghai Jiao Tong University and Twente University, which integrates topology-aware diffusion models for high-fidelity LiDAR point cloud generation [13][15]. - TopoLiDM achieved a 22.6% reduction in the Fréchet Range Image Distance (FRID) and a 9.2% reduction in Minimum Matching Distance (MMD) on the KITTI-360 dataset while maintaining a real-time generation speed of 1.68 samples per second [13][15]. Group 4: FastDriveVLA Framework - FastDriveVLA, developed by Peking University and Xiaopeng Motors, introduces a reconstruction-based visual token pruning framework that maintains 99.1% trajectory accuracy with a 50% pruning rate and reduces collision rates by 2.7% [21][22]. - The framework employs a novel adversarial foreground-background reconstruction strategy to enhance the identification of valuable tokens, achieving state-of-the-art performance on the nuScenes open-loop planning benchmark [27][28]. Group 5: PLA Framework - The article presents a unified Perception-Language-Action (PLA) framework proposed by TUM, which integrates multi-sensor fusion and GPT-4.1 enhanced visual-language-action reasoning for adaptive autonomous driving [34][35]. - The framework demonstrated a mean absolute error (MAE) of 0.39 m/s in speed prediction and an average displacement error (ADE) of 1.013 meters in trajectory tracking within urban intersection scenarios [42].
小导想让我手搓一台自驾小车,看到了这个就不想动手了......
自动驾驶之心· 2025-08-05 03:09
Core Points - The article introduces the "Black Warrior 001," a lightweight solution for autonomous driving research and education, which supports various functionalities such as perception, localization, fusion, navigation, and planning [1][2] - The product was launched three months ago, originally priced at 36,999 yuan, and currently includes three free courses upon purchase [1] - The platform is designed for various educational levels, including undergraduate and graduate students, as well as training institutions [2] Product Features - The Black Warrior 001 supports secondary development and modification, with numerous installation positions and interfaces for adding sensors like cameras and millimeter-wave radars [2] - It has been tested in various environments, including indoor, outdoor, and parking scenarios, demonstrating its capabilities in perception, localization, fusion, and navigation [4][6][8][10] Hardware Specifications - Key sensors include a Mid 360 3D LiDAR, a 2D LiDAR from Lidar, a depth camera from Orbbec, and an Nvidia Orin NX 16G main control chip [10][12] - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a maximum speed of 2 m/s [12] Software and Functionality - The software framework includes ROS, C++, and Python, with features for one-click startup and a provided development environment [14] - Various functionalities include 2D and 3D SLAM, point cloud processing, vehicle navigation, and obstacle avoidance [15] After-Sales Support - The company offers one year of after-sales support for non-human damage, with free repairs for issues caused by operational errors or code modifications during the warranty period [37]
面向量产VLA方案!FastDriveVLA:即插即用剪枝模块,推理加速近4倍(北大&小鹏)
自动驾驶之心· 2025-08-04 23:33
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel framework for visual token pruning in autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [2][3][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of end-to-end methods that complete perception to planning in a single model, reducing information loss between modules [3]. - The introduction of Visual-Language-Action (VLA) models enhances decision-making in complex scenarios, making them increasingly popular in autonomous driving systems [3][10]. Group 2: Visual Token Pruning - Existing VLM/VLA models encode images into numerous visual tokens, resulting in high computational costs. Current research explores two main directions for visual token pruning: attention mechanism-based methods and similarity-based methods [4][14]. - FastDriveVLA proposes a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground information, significantly reducing computational costs while maintaining performance [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA includes a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to focus on foreground areas and assign higher significance scores to key tokens [6][17]. - The framework utilizes a large-scale dataset, nuScenes-FG, containing 241,000 image-mask pairs for training, enhancing the model's ability to distinguish between foreground and background [6][12]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][34]. - The framework shows superior performance compared to existing methods, with improvements in L2 error and collision rates at various pruning ratios [30][34]. Group 5: Efficiency Analysis - FastDriveVLA significantly reduces FLOPs by approximately 7.5 times and decreases prefill and decode latencies, enhancing inference efficiency for real-time deployment [36][40]. - The lightweight design of ReconPruner allows for lower CUDA latency compared to several similar methods, making it suitable for practical applications [36][40].
自动驾驶下半场 - 千万级自动标注量产泛化的困局。。。
自动驾驶之心· 2025-08-04 23:33
Core Viewpoint - The article emphasizes the necessity of large-scale 4D automatic annotation in the second half of intelligent driving, highlighting the increasing demand for higher-level driving capabilities and the limitations of manual annotation methods [2][3]. Group 1: Importance of 4D Automatic Annotation - The shift towards higher-level intelligent driving capabilities necessitates millions of 4D automatic annotations to meet production demands [2]. - Manual annotation efficiency is insufficient for the growing needs of data quality and quantity, making 4D automatic annotation essential [2][3]. - The complexity of current annotation requirements, including the need for time-synchronized sensor data, underscores the importance of automated solutions [3]. Group 2: Challenges in Automatic Annotation - High requirements for spatiotemporal consistency complicate the tracking of dynamic targets across frames, leading to potential annotation errors [4]. - The integration of multimodal data from various sensors presents challenges in data synchronization and semantic unification [5]. - The unpredictability of dynamic scenes and environmental factors increases the difficulty of generalizing annotation models [5]. Group 3: Course Offerings and Learning Opportunities - The article promotes a course on 4D automatic annotation, designed to address entry-level challenges and optimize advanced learning [5][6]. - The course covers a comprehensive curriculum, including dynamic obstacle detection, SLAM reconstruction, and end-to-end annotation processes [6][7][10]. - It aims to equip learners with practical skills in 4D automatic annotation algorithms and their applications in real-world scenarios [22][25].
浙大MambaMap:基于状态空间模型的在线矢量高精地图构建
自动驾驶之心· 2025-08-04 23:33
Core Insights - The article introduces MambaMap, a novel framework for online vector high-definition map construction based on state space models, which is crucial for autonomous driving as it provides precise road information for downstream tasks [4][5]. Summary by Sections Key Contributions - MambaMap framework efficiently integrates long-range temporal information for online vector high-definition map construction using state space models [5]. - An effective gating mechanism is introduced in the state space for efficient information selection and integration at both BEV feature and instance query levels, along with various scanning strategies to leverage spatiotemporal dependencies [5]. - Extensive experiments on nuScenes and Argoverse2 datasets demonstrate that MambaMap outperforms state-of-the-art methods across various settings [5]. Experimental Results - In the nuScenes dataset, MambaMap achieved an average precision (mAP) of 40.1, outperforming other methods like StreamMapNet and SQD-MapNet [12]. - For the Argoverse2 dataset, MambaMap also showed superior performance with a mAP of 61.0, indicating its robustness and generalization capabilities [12]. - The article presents detailed performance metrics across different methods and datasets, highlighting MambaMap's advantages in various scenarios [11][12]. Methodology - MambaMap utilizes a dynamic memory mechanism and a gating state space model to efficiently fuse BEV features and instance-level features over multiple time steps, capturing long-range dependencies with minimal computational overhead [18]. - The introduction of multi-directional and spatiotemporal scanning strategies enhances feature extraction capabilities and temporal consistency [18]. Future Directions - Future work aims to extend MambaMap to address other BEV perception tasks, such as 3D object detection and motion prediction, thereby broadening its applicability in robotics [18].
自动驾驶秋招&社招求职群成立了!
自动驾驶之心· 2025-08-04 23:33
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, highlighting the shift from numerous diverse approaches to a more unified model, which indicates higher technical barriers in the industry [1] Group 1 - The industry is moving towards a unified solution with models like one model, VLM, and VLA, suggesting a reduction in the need for numerous algorithm engineers [1] - The article encourages the establishment of a large community to support industry professionals, facilitating growth and collaboration among peers [1] - A new job-related community is being launched to discuss industry trends, company developments, product research, and job opportunities [1]
暑期打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛报名即将截止~
自动驾驶之心· 2025-08-04 07:31
Group 1 - The competition aims to advance research in spatial intelligence and embodied intelligence, which are critical technologies for applications in autonomous driving, smart cities, and robotics [5][7] - The integration of reinforcement learning and computer vision is highlighted as a driving force for breakthroughs in the field [5][7] Group 2 - The competition is organized by a team of experts from various institutions, including Beijing University of Science and Technology and Tsinghua University, with sponsorship from Beijing Jiuzhang Yunjing Technology Co., Ltd [9][10] - Participants can register as individuals or teams, with a maximum of five members per team, and must submit their registration by August 10 [11][12] Group 3 - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation criteria [20][23] - For Spatial Intelligence, participants are required to construct a 3D reconstruction model based on multi-view aerial images, while the Embodied Intelligence track involves completing tasks in dynamic occlusion scenarios [20][23] Group 4 - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on a weighted formula [22][21] - The Embodied Intelligence track evaluates task completion and execution efficiency, with scores also based on a weighted system [23][25] Group 5 - Prizes for each track include cash rewards and computing resource vouchers, with a total of 12 awards distributed among the top teams [25][27] - The competition emphasizes the importance of intellectual property rights and requires participants to ensure their submissions are original and self-owned [31][28]
机器人不只会抓和放!北大x银河通用「世界-动作模型」来了
自动驾驶之心· 2025-08-04 07:31
Core Viewpoint - The article discusses the advancements in non-prehensile manipulation in robotics, emphasizing the development of the Dynamics-adaptive World Action Model (DyWA) to enhance robots' capabilities in complex physical interactions beyond simple pick-and-place tasks [4][10]. Summary by Sections Non-prehensile Manipulation - Non-prehensile manipulation refers to object manipulation techniques that do not involve grasping, such as pushing and flipping, which are essential for handling various objects in real-world scenarios [4][6]. Challenges in Non-prehensile Manipulation - The complexity of contact modeling and the variability of friction forces pose significant challenges for robots performing non-prehensile tasks. Small changes in surface conditions can drastically alter the movement trajectory of objects [7][8]. DyWA's Core Methodology - DyWA employs a teacher-student framework to train a model that predicts future states based on actions, allowing robots to "imagine" the outcomes of their movements, thus improving learning efficiency and generalization [10][11]. - A dynamic adaptation mechanism is introduced to infer hidden physical properties like friction and mass distribution from historical observations, enhancing the robot's interaction with its environment [11][12]. - DyWA is designed to operate with a single depth camera input, enabling zero-shot transfer from simulation to real-world applications, thus achieving robust manipulation capabilities [13]. Generalization Capabilities of DyWA - DyWA demonstrates superior performance in various experimental setups, achieving over 80% success rates in precise operations under known and unknown object states [16][17]. - In real-world tests, DyWA successfully adapts to different object geometries and friction surfaces, maintaining a success rate close to 70% for manipulating unseen objects [19][23]. Integration with Other Strategies - DyWA can work in conjunction with grasping strategies and visual language models, enhancing overall success rates in complex scenarios by first positioning objects for easier grasping [26].
CVPR 2025中稿新高的背后,录用率却仅22.1%。。。
自动驾驶之心· 2025-08-04 03:23
Core Viewpoint - The article highlights the challenges faced by researchers in the AI field, particularly in the paper submission process, leading to a high rejection rate due to various issues such as writing quality, methodological flaws, and misalignment with journal focus [1][2]. Group 1: Submission Challenges - Pain Point 1: 60% of desk rejections are due to misalignment with the journal's focus [3]. - Pain Point 2: Lack of innovation is a critical issue, with reviewers criticizing submissions for not addressing relevant problems [3]. - Pain Point 3: 65% of rejections stem from methodological flaws, indicating that many experiments are not reproducible [3]. - Pain Point 4: 78% of papers are rejected due to poor writing structure, with many authors failing to effectively communicate their research [3]. - Pain Point 5: 23% of initial rejections occur due to formatting errors in the submission process [2]. Group 2: Support and Solutions - The company offers personalized guidance from over 300 experienced mentors in the fields of autonomous driving and embodied intelligence, with a high success rate of 96% for students [4]. - The mentoring process includes comprehensive support from topic selection to submission, ensuring that students are well-prepared for the publication process [11]. - The program aims to help students build a clear research framework, improve coding skills, and enhance their overall research capabilities [9][12].