自动驾驶之心
Search documents
AI Day直播 | 自动驾驶空间检索新范式SpatialRetrievalAD
自动驾驶之心· 2025-12-17 03:18
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 现有自动驾驶系统非常依赖车载传感器进行实时精确的环境感知。然而,这种模式受行驶过程中的感知范围限制,在视野受限、遮挡或黑 暗、降雨等极端条件下常出现性能失效。相比之下,人类驾驶员即使在能见度不佳的情况下,仍能回忆起道路结构。为了让模型具备这种"回 忆"能力,针对这个特点, 复旦可信具身智能和上交等合作 ,将离线检索的地理图像作为额外输入引入系统。这些图像可从离线缓存(如谷 歌地图或已存储的自动驾驶数据集)中轻松获取,无需额外传感器,是现有自动驾驶任务的即插即用型扩展方案。 在实验中,首先通过谷歌地图API检索地理图像,扩展了nuScenes数据集,并将新数据与自车轨迹对齐。并在五个核心自动驾驶任务上建立了 基准:目标检测、在线建图、占用预测、端到端规划和生成式世界模型。其中在线建图mAP提升13.4%,占用预测静态类mIoU +2.57%,夜间 规划碰撞率从0.55%降至0.48%,为复杂场景自动驾驶提供低成本、高鲁棒的感知增强方案。大量实验表明,该扩展模态 ...
没有好的科研能力,别想着去业界搞自驾了......
自动驾驶之心· 2025-12-17 03:18
Core Viewpoint - The article discusses the high demand for skilled talent in the autonomous driving sector, highlighting the competitive salaries and the importance of comprehensive research capabilities for candidates [2]. Group 1: Talent Demand and Requirements - High-end autonomous driving talent is in great demand, with some companies offering annual packages of up to 700,000 yuan for master's degree holders [2]. - Candidates are expected to possess complete research capabilities, which include problem identification, definition, and solution formulation, rather than just theoretical knowledge [2]. Group 2: Research Challenges - Many students face challenges in their research, such as lack of familiarity with the field, absence of real data, and difficulties in experimental design [7]. - The fastest way to improve research skills is to work alongside experienced researchers, as indicated by the introduction of a 1-on-1 research mentoring service [3]. Group 3: Mentoring Services Offered - The mentoring services cover various advanced topics in autonomous driving, including end-to-end learning, reinforcement learning, multi-sensor fusion, and more [4]. - The company supports various research needs, including paper writing, experimental guidance, and thesis supervision [12][13]. Group 4: Publication Success - The article notes a high acceptance rate for papers, with several already included in top conferences and journals such as CVPR, AAAI, and ICLR [9].
北交&地平线提出DIVER:扩散+强化的多模态规划新框架
自动驾驶之心· 2025-12-17 03:18
Core Viewpoint - The article discusses the advancement of end-to-end autonomous driving systems, highlighting the introduction of the DIVER framework, which combines diffusion models and reinforcement learning to enhance trajectory diversity and safety in complex driving scenarios [3][33]. Group 1: Current Challenges in Autonomous Driving - Current end-to-end autonomous driving methods primarily rely on imitation learning from a single expert demonstration, leading to a lack of behavioral diversity and overly conservative planning in complex traffic situations [5][6]. - The existing models tend to converge around a single ground truth trajectory, resulting in limited exploration of diverse and safe decision-making options [7][8]. Group 2: Introduction of DIVER Framework - The DIVER framework integrates the multimodal generation capabilities of diffusion models with the goal-oriented constraints of reinforcement learning, transforming trajectory generation into a strategy generation problem under safety and diversity constraints [9][33]. - DIVER aims to produce multiple feasible and semantically valid candidate trajectories, addressing the limitations of traditional imitation learning approaches [9][33]. Group 3: Technical Innovations of DIVER - DIVER employs a Policy-Aware Diffusion Generator (PADG) that incorporates contextual information such as maps and dynamic agents, ensuring that generated trajectories are both semantically clear and feasible [16][20]. - The framework utilizes multiple reference ground truths to align each predicted trajectory with a specific driving intention, thereby preventing mode collapse and enhancing diversity [20][21]. Group 4: Performance Metrics and Results - In various benchmark evaluations, DIVER significantly outperformed existing methods in terms of trajectory diversity and safety, achieving lower collision rates while expanding the range of behaviors covered [28][30]. - The DIVER framework demonstrated superior performance in long-term planning tasks, maintaining the lowest collision rates while achieving higher diversity metrics compared to competitors [32][36]. Group 5: Conclusion and Implications - DIVER represents a significant step towards more human-like decision-making in autonomous driving by addressing the long-standing issues associated with imitation learning [33][34]. - The integration of generative models with reinforcement learning is positioned as a crucial advancement for the future of realistic autonomous driving applications [34].
华科&小米联合提出MindDrive:首个证实在线强化学习有效性的VLA框架......
自动驾驶之心· 2025-12-17 00:03
Core Insights - The article introduces MindDrive, a novel framework for autonomous driving that utilizes online reinforcement learning (RL) to enhance the performance of vision-language-action (VLA) models [2][4][44] - MindDrive demonstrates significant improvements in driving scores and success rates compared to traditional end-to-end paradigms and state-of-the-art (SOTA) models, achieving a driving score (DS) of 78.04 and a success rate (SR) of 55.09% [9][38] Background Review - Autonomous driving relies on models that can perceive, decide, and execute actions in dynamic environments. Traditional frameworks often lack common sense and causal reasoning capabilities [4] - Current VLA models primarily use imitation learning (IL), which can lead to causal confusion and distribution shifts, resulting in irreversible errors in closed-loop driving scenarios [4][5] MindDrive Framework - MindDrive consists of two main components: a decision expert and an action expert, both utilizing a shared vision encoder and text tokenizer, but differing in their low-rank adaptation (LoRA) parameters [11][18] - The decision expert generates abstract driving decisions based on navigation commands and visual inputs, while the action expert translates these decisions into specific action trajectories [11][18] Online Reinforcement Learning Approach - MindDrive employs online RL to optimize the decision-making process by sampling different trajectories and receiving feedback from the environment, thus enhancing the model's understanding of causal relationships [22][30] - The framework is designed to operate within a closed-loop simulation environment, specifically using the CARLA simulator, which allows for efficient data collection and training [8][24] Experimental Results - MindDrive outperforms traditional end-to-end methods and other VLA models, achieving a driving score that is 10.12 points higher than the best imitation learning model and 6.68 points higher than the best offline RL method [38][40] - The model's performance in complex driving scenarios, such as overtaking and yielding, shows significant improvements, indicating enhanced causal reasoning and decision robustness [38][40] Conclusion - MindDrive represents a significant advancement in the application of online RL to autonomous driving, providing a framework that effectively maps language instructions to actions while optimizing exploration efficiency [44] - The results suggest that MindDrive could inspire further developments in the autonomous driving sector, particularly in enhancing the capabilities of VLA models [44]
厘米级精度的三维场景实时重构!这款激光扫描仪太好用了~
自动驾驶之心· 2025-12-17 00:03
Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective handheld 3D laser scanner designed for various applications, emphasizing its advanced features and capabilities in real-time 3D mapping and data collection [3][11]. Group 1: Product Features - GeoScan S1 offers a lightweight design with a one-button startup, enabling efficient and practical 3D solutions [3][6]. - It utilizes a multi-modal sensor fusion algorithm to achieve centimeter-level precision in real-time 3D scene reconstruction, capable of generating 200,000 points per second and covering a measurement distance of up to 70 meters [3][31]. - The device supports scanning areas over 200,000 square meters and can be equipped with a 3D Gaussian data collection module for high-fidelity scene restoration [3][53]. Group 2: Technical Specifications - The GeoScan S1 operates on a hand-held Ubuntu system and integrates various sensor devices, including RTK, IMU, and dual wide-angle cameras, ensuring high precision and data synchronization [5][36]. - It features a relative accuracy of better than 3 cm and an absolute accuracy of better than 5 cm, with a battery life of approximately 3 to 4 hours [24][25]. - The device dimensions are 14.2 cm x 9.5 cm x 45 cm, weighing 1.3 kg without the battery and 1.9 kg with the battery [24]. Group 3: Market Position and Pricing - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan [11][60]. - Various versions are available, including a basic version, a depth camera version, and online/offline 3DGS versions, catering to different user needs and budgets [60][61]. Group 4: Application Scenarios - The GeoScan S1 is suitable for a wide range of environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mines, effectively completing 3D scene mapping [40][49]. - It supports cross-platform integration, making it compatible with drones, unmanned vehicles, and robotic systems for automated operations [47].
复旦最新一篇DriveVGGT:面向自动驾驶,高效实现多相机4D重建
自动驾驶之心· 2025-12-17 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Xiaosong Jia等 编辑 | 自动驾驶之心 自动驾驶中的4D场景重建是实现环境感知与运动规划的关键环节,然而传统视觉几何模型在多相机、低重叠的自动驾驶场景中往往表现不佳。 来自上海交大、复旦等机构的研究者提出 DriveVGGT,一种专为自动驾驶设计的视觉几何Transformer,通过显式引入相机相对位姿先验,显著提升了多相机系统的几 何预测一致性与推理效率。 更多自动驾驶的行业信息、技术进展,欢迎加入自动驾驶之心知识星球获取! 背景介绍 4D重建是一项从视觉传感器预测几何信息的计算机视觉任务。与其他传感器相比,基于相机的重建因其低成本而在各个领域,尤其是在自动驾驶和机器人学中,得到 了广泛的研究和应用。通常,重建方法有两种类型。第一种是基于迭代的方法,例如。这些方法需要选择特定的场景或物体,并通过迭代重建来获得优化结果。然 而,由于泛化能力不足,当场景或物体发生变化或修改时,基于迭代的方法需要重新训练模型。第二种是前向方法。这些方法 ...
直观理解Flow Matching生成式算法
自动驾驶之心· 2025-12-17 00:03
Core Viewpoint - The article discusses the Flow Matching algorithm, a generative model that simplifies the process of generating samples similar to a target dataset without complex mathematical concepts or derivations [3][4][12]. Algorithm Principle - Flow Matching is a generative model that aims to generate samples close to a given target set without requiring input [3][4]. - The algorithm learns a direction of movement from a source point to a target point, effectively guiding the generation process [14][16]. Training and Inference - During training, the model samples points along the line from source to target and averages the slopes from multiple connections to determine the direction of movement [17]. - In inference, the model starts from a noise point and iteratively moves towards the target, collapsing into a specific state as it approaches the target [17][18]. Code Implementation - The code provided demonstrates a simple implementation of the Flow Matching algorithm, including the generation of random input points and the prediction of slopes using a neural network [18][19]. - The model uses a vector field to predict the direction and speed of movement towards the target distribution [19][20]. Advanced Applications - The article mentions the adaptation of Flow Matching for conditional generation tasks, allowing for the generation of samples based on specific prompts or conditions [24][30]. - An example is given of generating handwritten digits from the MNIST dataset using Flow Matching, showcasing its versatility in different generative tasks [30][32]. Conclusion - Flow Matching presents a more efficient alternative to diffusion models in generative tasks, with applications in various fields including image generation and automated driving [12][43].
这个自动驾驶黄埔军校,近4500人了
自动驾驶之心· 2025-12-16 09:25
Core Insights - The article emphasizes the establishment of a comprehensive community for autonomous driving knowledge, aiming to facilitate learning and collaboration among industry professionals and newcomers [22][10][8] Group 1: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, providing a platform for technical exchange and job opportunities [8][22] - The community offers a variety of resources, including video tutorials, learning routes, and Q&A sessions, to help members navigate the complexities of autonomous driving technology [10][23] - Members can access insights from industry leaders and academic experts, enhancing their understanding of the latest trends and technologies in autonomous driving [11][10] Group 2: Technical Insights and Developments - Recent updates include discussions on Waymo's latest base model, advancements in self-driving technology, and insights from industry conferences [7][10] - The community has compiled over 40 technical routes covering various aspects of autonomous driving, such as perception, simulation, and planning control [23][10] - Key topics include end-to-end autonomous driving, multi-modal large models, and the integration of traditional planning with new technologies [44][52][56] Group 3: Job Opportunities and Industry Trends - The community provides job recommendations and internal referrals to help members connect with leading companies in the autonomous driving sector [27][10] - Members can inquire about job openings, industry trends, and the future of autonomous driving technologies, fostering a supportive environment for career development [26][10] - The platform encourages collaboration between academia and industry, aiming to bridge the gap between research and practical applications in autonomous driving [22][11]
文远知行韩旭:中国真正L4只有3家......
自动驾驶之心· 2025-12-16 09:25
Core Insights - The article discusses the evolution of autonomous driving, highlighting the progress made by WeRide, which is now a publicly listed company in both the US and Hong Kong, recognized as the "first Robotaxi stock" [3] - The CEO, Han Xu, emphasizes the importance of focusing on talent acquisition and international expansion, indicating a shift from proving the viability of autonomous driving to scaling operations [6][9] - Han Xu asserts that only three companies in China can truly operate Level 4 (L4) autonomous vehicles without a safety driver, indicating a significant technological barrier in the industry [7][28] Group 1 - WeRide has deployed over 1,600 autonomous vehicles globally, with more than 750 being Robotaxis, marking a growth of at least 30% since its IPO [9][10] - The company has recognized the international demand for Robotaxis, being the first to deploy in 11 countries and 30 cities, and has obtained eight different autonomous driving licenses [12][13] - Han Xu reflects on the past challenges of securing funding and the skepticism surrounding autonomous driving, contrasting it with the current landscape where the technology is being implemented at scale [14][16][18] Group 2 - The article highlights the distinction between L2 and L4 autonomous driving, with Han Xu stating that the barriers between these levels have not been broken, and many companies claiming to be L4 are actually "pseudo-Robotaxi" firms [24][29] - Han Xu sets two criteria for a company to be considered a true L4 operator: having a fleet of at least 20-30 vehicles operating without a safety driver for six months, and achieving pure unmanned commercial operations [25][27] - The article discusses the competitive landscape, with Han Xu predicting that if Tesla continues to use L2 vehicles without advanced sensors, it will not reach the operational level of Waymo in three years [33][34] Group 3 - WeRide is also advancing in the L2+ space, having launched a one-step end-to-end solution that is ready for mass production, indicating a dual focus on both L2 and L4 technologies [22][40] - The company collaborates with Bosch to enhance high-level intelligent driving solutions, showcasing its capability in both L4 and L2 technology stacks [37][40] - The article concludes with Han Xu's insights on the increasing salaries for AI talent, reflecting the growing demand and competition in the field, with top salaries reaching up to 5 million [54]
理想一篇OCC世界模型:全新轨迹条件稀疏占用世界模型SparseWorld-TC
自动驾驶之心· 2025-12-16 03:16
Core Insights - The article discusses a revolutionary breakthrough in end-to-end autonomous driving prediction technology, specifically through the introduction of the SparseWorld-TC model, which addresses limitations of traditional methods by utilizing sparse representations and attention mechanisms [2][3][40]. Group 1: Evolution and Challenges of World Models - World models are essential for understanding dynamic environments in AI systems, particularly in autonomous driving, where they predict physical environment evolution [6]. - Current world model methods face three main limitations: information loss due to discretization, rigidity from geometric priors in BEV representations, and challenges in capturing temporal dependencies with autoregressive methods [7]. - Sparse representations offer a promising solution by modeling only the occupied areas of a scene, thus reducing computational complexity and preserving continuous characteristics [8]. Group 2: Innovations of SparseWorld-TC - SparseWorld-TC features a pure attention-driven architecture that eliminates traditional tokenization and intermediate representations, allowing for more flexible spatiotemporal modeling [9]. - The model employs a sparse occupancy representation method based on anchor points, which are initialized with 3D points and feature vectors to predict occupancy and semantic labels [11][12]. - A trajectory conditioning mechanism is integrated, where the vehicle's planned trajectory provides crucial signals for the world model, enhancing prediction accuracy [13][14]. Group 3: Performance Evaluation and Results - SparseWorld-TC demonstrates significant advancements in 4D occupancy prediction, achieving high performance on the nuScenes benchmark with metrics such as geometric IoU and semantic mIoU [29][30]. - The model outperforms traditional methods, particularly in long-term prediction tasks, with the SparseWorld-TC-Large version achieving a semantic mIoU of 29.89% and an average IoU of 49.21% [33]. - The model's ability to maintain stability in long-term predictions, especially beyond 4 seconds, is highlighted as a key advantage over competing methods [34]. Group 4: Future Applications and Extensions - The architecture of SparseWorld-TC is not limited to occupancy prediction; it also shows potential for sensor-level observation generation, which could enhance self-supervised training and scene reconstruction [41]. - The integration of feedforward Gaussian prediction expands the model's capabilities, allowing for the generation of sensor observations based on trajectory conditions, which is beneficial for "what-if" analyses [51]. - Future research directions include improving self-supervised learning capabilities, enhancing dynamic scene modeling, and effectively fusing data from multiple sensors to boost prediction accuracy [54].