自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

自动驾驶之心· 2025-08-05 03:09

最近我们一个学员找我们咨询自动驾驶的科研平台，老师让他自己搭建一套，但系统较为复杂，无从下手。直到我们推荐给他黑武士，简直是梦中情车，功能上满足了所有科研要求。就在3个月前，面向科研&教学级自动驾驶全栈小车黑武士系列001正式开售了。世界太枯燥了，和我们一起做点有意思的事情吧。原价36999元，现在下单赠送3门课程（模型部署+点云3D检测+多传感器融合），优先锁定的安排组装发货。 1）黑武士001 黑武士支持二次开发和改装，预留了众多安装位置和接口，可以加装相机、毫米波雷达等传感器；黑武士001是自动驾驶之心团队推出的教研一体轻量级解决方案，支持感知、定位、融合、导航、规划等多个功能平台，阿克曼底盘。本科生学习进阶+比赛；√ 研究生科研+发论文；√ 研究生找工作+项目；√ 高校实验室教具；√ 培训公司/职业院校教具；√ 2）效果展示我们测试了室内、室外、地库等场景下感知、定位、融合、导航规划等功能；整体功能介绍上下坡测试室外大场景3D建图户外公园行驶点云3D目标检测室内地库2D激光建图室内地库3D激光建图室外夜间行驶 3）硬件说明 | 主要传感器 | 传感器说明 | | ...

面向量产VLA方案！FastDriveVLA：即插即用剪枝模块，推理加速近4倍（北大&小鹏）

自动驾驶之心· 2025-08-04 23:33

Core Viewpoint - The article discusses the development of FastDriveVLA, a novel framework for visual token pruning in autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [2][3][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of end-to-end methods that complete perception to planning in a single model, reducing information loss between modules [3]. - The introduction of Visual-Language-Action (VLA) models enhances decision-making in complex scenarios, making them increasingly popular in autonomous driving systems [3][10]. Group 2: Visual Token Pruning - Existing VLM/VLA models encode images into numerous visual tokens, resulting in high computational costs. Current research explores two main directions for visual token pruning: attention mechanism-based methods and similarity-based methods [4][14]. - FastDriveVLA proposes a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground information, significantly reducing computational costs while maintaining performance [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA includes a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to focus on foreground areas and assign higher significance scores to key tokens [6][17]. - The framework utilizes a large-scale dataset, nuScenes-FG, containing 241,000 image-mask pairs for training, enhancing the model's ability to distinguish between foreground and background [6][12]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][34]. - The framework shows superior performance compared to existing methods, with improvements in L2 error and collision rates at various pruning ratios [30][34]. Group 5: Efficiency Analysis - FastDriveVLA significantly reduces FLOPs by approximately 7.5 times and decreases prefill and decode latencies, enhancing inference efficiency for real-time deployment [36][40]. - The lightweight design of ReconPruner allows for lower CUDA latency compared to several similar methods, making it suitable for practical applications [36][40].

自动驾驶下半场 - 千万级自动标注量产泛化的困局。。。

自动驾驶之心· 2025-08-04 23:33

Core Viewpoint - The article emphasizes the necessity of large-scale 4D automatic annotation in the second half of intelligent driving, highlighting the increasing demand for higher-level driving capabilities and the limitations of manual annotation methods [2][3]. Group 1: Importance of 4D Automatic Annotation - The shift towards higher-level intelligent driving capabilities necessitates millions of 4D automatic annotations to meet production demands [2]. - Manual annotation efficiency is insufficient for the growing needs of data quality and quantity, making 4D automatic annotation essential [2][3]. - The complexity of current annotation requirements, including the need for time-synchronized sensor data, underscores the importance of automated solutions [3]. Group 2: Challenges in Automatic Annotation - High requirements for spatiotemporal consistency complicate the tracking of dynamic targets across frames, leading to potential annotation errors [4]. - The integration of multimodal data from various sensors presents challenges in data synchronization and semantic unification [5]. - The unpredictability of dynamic scenes and environmental factors increases the difficulty of generalizing annotation models [5]. Group 3: Course Offerings and Learning Opportunities - The article promotes a course on 4D automatic annotation, designed to address entry-level challenges and optimize advanced learning [5][6]. - The course covers a comprehensive curriculum, including dynamic obstacle detection, SLAM reconstruction, and end-to-end annotation processes [6][7][10]. - It aims to equip learners with practical skills in 4D automatic annotation algorithms and their applications in real-world scenarios [22][25].

自动驾驶之心· 2025-08-04 23:33

Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, highlighting the shift from numerous diverse approaches to a more unified model, which indicates higher technical barriers in the industry [1] Group 1 - The industry is moving towards a unified solution with models like one model, VLM, and VLA, suggesting a reduction in the need for numerous algorithm engineers [1] - The article encourages the establishment of a large community to support industry professionals, facilitating growth and collaboration among peers [1] - A new job-related community is being launched to discuss industry trends, company developments, product research, and job opportunities [1]

Autonomous Driving Technology

Autonomous Driving Technology

浙大MambaMap：基于状态空间模型的在线矢量高精地图构建

自动驾驶之心· 2025-08-04 23:33

作者 | 自动驾驶专栏来源 | 自动驾驶专栏点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文摘要本文介绍了 MambaMap：基于状态空间模型的在线矢量高精地图构建。高精（HD）地图对于自动驾驶至关重要，因为它们为下游任务提供了精确的道路信息。最新的进展突出了时间建模在应对遮挡和延伸的感知范围等挑战方面的潜力。然而，现有的方法要么无法充分利用时间信息，要么在处理扩展序列方面产生巨大的计算开销。为了应对这些挑战，本文提出了MambaMap，这是一种新型的框架，它能够高效地融合状态空间中的长距离时间特征，以构建在线矢量高精地图。具体而言，MambaMap结合了记忆库来存储并且利用历史帧信息，动态地更新BEV特征和实例查询以提高对噪声和遮挡的鲁棒性。此外，本文还在状态空间中引入了门控机制，以计算高效的方式选择性地集成地图元素的依赖关系。创新性地，本文设计了多向扫描策略和时空扫描策略，分别在BEV级和实例级增强特征提取能力。这些策略显著提高了所提出方法的预测准 ...

暑期打比赛！PRCV 2025空间智能与具身智能视觉感知挑战赛报名即将截止~

自动驾驶之心· 2025-08-04 07:31

Group 1 - The competition aims to advance research in spatial intelligence and embodied intelligence, which are critical technologies for applications in autonomous driving, smart cities, and robotics [5][7] - The integration of reinforcement learning and computer vision is highlighted as a driving force for breakthroughs in the field [5][7] Group 2 - The competition is organized by a team of experts from various institutions, including Beijing University of Science and Technology and Tsinghua University, with sponsorship from Beijing Jiuzhang Yunjing Technology Co., Ltd [9][10] - Participants can register as individuals or teams, with a maximum of five members per team, and must submit their registration by August 10 [11][12] Group 3 - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation criteria [20][23] - For Spatial Intelligence, participants are required to construct a 3D reconstruction model based on multi-view aerial images, while the Embodied Intelligence track involves completing tasks in dynamic occlusion scenarios [20][23] Group 4 - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on a weighted formula [22][21] - The Embodied Intelligence track evaluates task completion and execution efficiency, with scores also based on a weighted system [23][25] Group 5 - Prizes for each track include cash rewards and computing resource vouchers, with a total of 12 awards distributed among the top teams [25][27] - The competition emphasizes the importance of intellectual property rights and requires participants to ensure their submissions are original and self-owned [31][28]

机器人不只会抓和放！北大x银河通用「世界-动作模型」来了

自动驾驶之心· 2025-08-04 07:31

Core Viewpoint - The article discusses the advancements in non-prehensile manipulation in robotics, emphasizing the development of the Dynamics-adaptive World Action Model (DyWA) to enhance robots' capabilities in complex physical interactions beyond simple pick-and-place tasks [4][10]. Summary by Sections Non-prehensile Manipulation - Non-prehensile manipulation refers to object manipulation techniques that do not involve grasping, such as pushing and flipping, which are essential for handling various objects in real-world scenarios [4][6]. Challenges in Non-prehensile Manipulation - The complexity of contact modeling and the variability of friction forces pose significant challenges for robots performing non-prehensile tasks. Small changes in surface conditions can drastically alter the movement trajectory of objects [7][8]. DyWA's Core Methodology - DyWA employs a teacher-student framework to train a model that predicts future states based on actions, allowing robots to "imagine" the outcomes of their movements, thus improving learning efficiency and generalization [10][11]. - A dynamic adaptation mechanism is introduced to infer hidden physical properties like friction and mass distribution from historical observations, enhancing the robot's interaction with its environment [11][12]. - DyWA is designed to operate with a single depth camera input, enabling zero-shot transfer from simulation to real-world applications, thus achieving robust manipulation capabilities [13]. Generalization Capabilities of DyWA - DyWA demonstrates superior performance in various experimental setups, achieving over 80% success rates in precise operations under known and unknown object states [16][17]. - In real-world tests, DyWA successfully adapts to different object geometries and friction surfaces, maintaining a success rate close to 70% for manipulating unseen objects [19][23]. Integration with Other Strategies - DyWA can work in conjunction with grasping strategies and visual language models, enhancing overall success rates in complex scenarios by first positioning objects for easier grasping [26].

具身智能

非抓握操作

机器人

自适应性【世界 - 动作】模型Dynamics-adaptive World Action Model (DyWA)

具身智能

非抓握操作

机器人

自适应性【世界 - 动作】模型Dynamics-adaptive World Action Model (DyWA)

CVPR 2025中稿新高的背后，录用率却仅22.1%。。。

自动驾驶之心· 2025-08-04 03:23

Core Viewpoint - The article highlights the challenges faced by researchers in the AI field, particularly in the paper submission process, leading to a high rejection rate due to various issues such as writing quality, methodological flaws, and misalignment with journal focus [1][2]. Group 1: Submission Challenges - Pain Point 1: 60% of desk rejections are due to misalignment with the journal's focus [3]. - Pain Point 2: Lack of innovation is a critical issue, with reviewers criticizing submissions for not addressing relevant problems [3]. - Pain Point 3: 65% of rejections stem from methodological flaws, indicating that many experiments are not reproducible [3]. - Pain Point 4: 78% of papers are rejected due to poor writing structure, with many authors failing to effectively communicate their research [3]. - Pain Point 5: 23% of initial rejections occur due to formatting errors in the submission process [2]. Group 2: Support and Solutions - The company offers personalized guidance from over 300 experienced mentors in the fields of autonomous driving and embodied intelligence, with a high success rate of 96% for students [4]. - The mentoring process includes comprehensive support from topic selection to submission, ensuring that students are well-prepared for the publication process [11]. - The program aims to help students build a clear research framework, improve coding skills, and enhance their overall research capabilities [9][12].

人工智能

深度学习

Artificial Intelligence

论文辅导服务

人工智能

深度学习

Artificial Intelligence

论文辅导服务

厘米级精度重建！最强性价比3D激光扫描仪~

自动驾驶之心· 2025-08-04 03:23

Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective 3D laser scanner designed for industrial and research applications, emphasizing its lightweight design, ease of use, and advanced features for real-time 3D scene reconstruction. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in 3D scene reconstruction using a multi-modal sensor fusion algorithm, capable of generating point clouds at a rate of 200,000 points per second and covering distances up to 70 meters [1][29]. - It supports scanning of large areas exceeding 200,000 square meters and can be equipped with a 3D Gaussian data collection module for high-fidelity scene restoration [1][30]. - The device is designed for easy operation with a one-button start feature, allowing users to quickly initiate scanning tasks without complex setups [5][42]. Group 2: Technical Specifications - The GeoScan S1 integrates various sensors, including a high-precision IMU and RTK, enabling it to handle complex indoor and outdoor environments effectively [34][46]. - It features a compact design with dimensions of 14.2cm x 9.5cm x 45cm and weighs 1.3kg without the battery, ensuring portability [22]. - The device operates on a power input range of 13.8V to 24V and has a battery capacity of 88.8Wh, providing approximately 3 to 4 hours of operational time [22][26]. Group 3: Market Positioning - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan for the basic version [9][57]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, enhancing its credibility in the industry [9][38]. - The scanner supports cross-platform integration, making it compatible with various unmanned platforms such as drones and robotic vehicles, facilitating automation in data collection [44][46].

性能暴涨30%！港中文ReAL-AD：类人推理的端到端算法 (ICCV'25)

自动驾驶之心· 2025-08-03 23:32

Core Viewpoint - The article discusses the ReAL-AD framework, which integrates human-like reasoning into end-to-end autonomous driving systems, enhancing decision-making processes through a structured approach that mimics human cognitive functions [3][43]. Group 1: Framework Overview - ReAL-AD employs a reasoning-enhanced learning framework based on a three-layer human cognitive model: driving strategy, decision-making, and operation [3][5]. - The framework incorporates a visual-language model (VLM) to improve environmental perception and structured reasoning capabilities, allowing for a more nuanced decision-making process [3][5]. Group 2: Components of ReAL-AD - The framework consists of three main components: 1. **Strategic Reasoning Injector**: Utilizes VLM to generate insights for complex traffic situations, forming high-level driving strategies [5][11]. 2. **Tactical Reasoning Integrator**: Converts strategic intentions into executable tactical choices, bridging the gap between strategy and operational decisions [5][14]. 3. **Hierarchical Trajectory Decoder**: Simulates human decision-making by establishing rough motion patterns before refining them into detailed trajectories [5][20]. Group 3: Performance Evaluation - In open-loop evaluations, ReAL-AD demonstrated significant improvements over baseline methods, achieving over 30% better performance in L2 error and collision rates [36]. - The framework achieved the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the nuScenes dataset, indicating enhanced learning efficiency in driving capabilities [36]. - Closed-loop evaluations showed that the introduction of the ReAL-AD framework significantly improved driving scores and successful path completions compared to baseline models [37]. Group 4: Experimental Setup - The evaluation utilized the nuScenes dataset, which includes 1,000 scenes sampled at 2Hz, and the Bench2Drive dataset, covering 44 scenarios and 23 weather conditions [34]. - Metrics for evaluation included L2 error, collision rates, driving scores, and success rates, providing a comprehensive assessment of the framework's performance [35][39]. Group 5: Ablation Studies - Ablation studies indicated that removing the Strategic Reasoning Injector led to a 12% increase in average L2 error and a 19% increase in collision rates, highlighting its importance in guiding decision-making [40]. - The Tactical Reasoning Integrator was shown to reduce average L2 error by 0.14 meters and collision rates by 0.05%, emphasizing the value of tactical commands in planning [41]. - Replacing the Hierarchical Trajectory Decoder with a multi-layer perceptron resulted in increased L2 error and collision rates, underscoring the necessity of a hierarchical decoding process for trajectory prediction [41].