Autonomous Driving

Search documents
What will it take for robotaxis to go global? | FT
Financial Times· 2025-08-18 04:00
Robo taxis are proving popular in cities like San Francisco, moving from concept to reality with the likes of Alphabet, Amazon, and Tesla all making significant investments in this space. Following the shuttering of General Motors cruise project, the US robo taxi market has fallen into the hands of just a few of the richest, most determined tech giants and a handful of startups bold enough to challenge them. Whimo, an autonomous driving tech company owned by Alphabet, Google's parent company, is now operati ...
文远知行获Grab数千万美元投资,加速在东南亚大规模部署Robotaxi
Sou Hu Cai Jing· 2025-08-18 01:40
Group 1 - WeRide, an autonomous driving technology company, announced a multi-million dollar equity investment from Southeast Asian super app platform Grab [1][3] - The investment is part of a strategic partnership aimed at accelerating the large-scale deployment of Level 4 Robotaxis and other autonomous vehicles in Southeast Asia [3] - The investment is expected to be completed by the first half of 2026, with the exact timing dependent on WeRide's chosen conditions [3] Group 2 - Grab's investment will support WeRide's international growth strategy, expanding its commercial autonomous vehicle fleet in Southeast Asia and promoting AI-driven mobility solutions [3] - WeRide's CEO, Han Xu, expressed the vision of gradually deploying thousands of Robotaxis in Southeast Asia, considering local regulations and societal acceptance [3] - The partnership leverages WeRide's advanced autonomous driving technology and operational experience alongside Grab's platform advantages to provide safe and efficient Robotaxi services [3]
自动驾驶VLA:OpenDriveVLA、AutoVLA
自动驾驶之心· 2025-08-18 01:32
Core Insights - The article discusses two significant papers, OpenDriveVLA and AutoVLA, which focus on applying large visual-language models (VLM) to end-to-end autonomous driving, highlighting their distinct technical paths and philosophies [22]. Group 1: OpenDriveVLA - OpenDriveVLA aims to address the "modal gap" in traditional VLMs when dealing with dynamic 3D driving environments, emphasizing the need for structured understanding of the 3D world [23]. - The methodology includes several key steps: 3D visual environment perception, visual-language hierarchical alignment, and a multi-stage training paradigm [24][25]. - The model utilizes structured, layered tokens (Agent, Map, Scene) to enhance the VLM's understanding of the environment, which helps mitigate spatial hallucination risks [6][9]. - OpenDriveVLA achieved state-of-the-art performance in the nuScenes open-loop planning benchmark, demonstrating its effective perception-based anchoring strategy [10][20]. Group 2: AutoVLA - AutoVLA focuses on integrating driving tasks into the native operation of VLMs, transforming them from scene narrators to genuine decision-makers [26]. - The methodology features layered visual token extraction, where the model creates discrete action codes instead of continuous coordinates, thus converting trajectory planning into a next-token prediction task [14][29]. - The model employs a dual-mode thinking approach, allowing it to adapt its reasoning depth based on scene complexity, balancing efficiency and effectiveness [28]. - AutoVLA's reinforcement learning fine-tuning (RFT) enhances its driving strategy, enabling the model to optimize its behavior actively rather than merely imitating human driving [30][35]. Group 3: Comparative Analysis - OpenDriveVLA emphasizes perception-language alignment to improve VLM's understanding of the 3D world, while AutoVLA focuses on language-decision integration to enhance VLM's decision-making capabilities [32]. - The two models represent complementary approaches: OpenDriveVLA provides a robust perception foundation, while AutoVLA optimizes decision-making strategies through reinforcement learning [34]. - Future models may combine the strengths of both approaches, utilizing OpenDriveVLA's structured perception and AutoVLA's action tokenization and reinforcement learning to create a powerful autonomous driving system [36].
成本降低14倍!DiffCP:基于扩散模型的协同感知压缩新范式~
自动驾驶之心· 2025-08-18 01:32
Core Viewpoint - The article introduces DiffCP, a novel collaborative perception framework that utilizes conditional diffusion models to significantly reduce communication costs while maintaining high performance in collaborative sensing tasks [3][4][20]. Group 1: Introduction to Collaborative Perception - Collaborative perception (CP) is emerging as a promising solution to address the inherent limitations of independent intelligent systems, particularly in challenging wireless communication environments [3]. - Current C-V2X systems face significant bandwidth limitations, making it difficult to support feature-level and raw data-level collaborative algorithms [3]. Group 2: DiffCP Framework - DiffCP is the first collaborative perception architecture that employs conditional diffusion models to capture geometric correlations and semantic differences for efficient data transmission [4]. - The framework integrates prior knowledge, geometric relationships, and received semantic features to reconstruct collaborative perception information, introducing a new paradigm based on generative models [4][5]. Group 3: Performance and Efficiency - Experimental results indicate that DiffCP achieves robust perception performance in ultra-low bandwidth scenarios, reducing communication costs by 14.5 times while maintaining state-of-the-art algorithm performance [4][20]. - DiffCP can be integrated into existing BEV-based collaborative algorithms for various downstream tasks, significantly lowering bandwidth requirements [4]. Group 4: Technical Implementation - The framework utilizes a pre-trained BEV-based perception algorithm to extract BEV features, embedding diffusion time steps, relative spatial positions, and semantic vectors as conditions [5]. - An iterative denoising process is employed, where the model integrates observations from the host vehicle with collaborative features to progressively recover original collaborative perception features [8]. Group 5: Application in 3D Object Detection - DiffCP was evaluated in a case study on 3D object detection, demonstrating its ability to achieve similar accuracy levels as state-of-the-art algorithms while reducing data rates by 14.5 times [20]. - The framework allows for adaptive data rates through variable semantic vector lengths, enhancing performance in challenging scenarios [20]. Group 6: Conclusion - DiffCP represents a significant advancement in collaborative perception, enabling efficient information compression and reconstruction for collaborative sensing tasks, thus facilitating the deployment of connected intelligent transportation systems in existing wireless communication frameworks [22].
你的2026届秋招进展怎么样了?
自动驾驶之心· 2025-08-16 16:04
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, indicating a shift from numerous diverse approaches to a more unified model, which raises the technical barriers in the industry [1] Group 1 - The industry is witnessing a trend where previously many directions requiring algorithm engineers are now consolidating into unified models such as one model, VLM, and VLA [1] - The article encourages the establishment of a large community to support individuals in the industry, highlighting the limitations of individual efforts [1] - A new job and industry-related community is being launched to facilitate discussions on industry trends, company developments, product research, and job opportunities [1]
自动驾驶论文速递 | 视觉重建、RV融合、推理、VLM等
自动驾驶之心· 2025-08-16 09:43
Core Insights - The article discusses two innovative approaches in autonomous driving technology: Dream-to-Recon for monocular 3D scene reconstruction and SpaRC-AD for radar-camera fusion in end-to-end autonomous driving [2][13]. Group 1: Dream-to-Recon - Dream-to-Recon is a method developed by the Technical University of Munich that enables monocular 3D scene reconstruction using only a single image for training [2][6]. - The method integrates a pre-trained diffusion model with a deep network through a three-stage framework: 1. View Completion Model (VCM) enhances occlusion filling and image distortion correction, achieving a PSNR of 23.9 [2][6]. 2. Synthetic Occupancy Field (SOF) constructs dense 3D scene geometry from multiple synthetic views, with occlusion reconstruction accuracy (IE_acc) reaching 72%-73%, surpassing multi-view supervised methods by 2%-10% [2][6]. 3. A lightweight distilled model converts generated geometry into a real-time inference network, achieving overall accuracy (O_acc) of 90%-97% on KITTI-360/Waymo, with a 70x speed improvement (75ms/frame) [2][6]. - The method provides a new paradigm for efficient 3D perception in autonomous driving and robotics without complex sensor calibration [2][6]. Group 2: SpaRC-AD - SpaRC-AD is the first radar-camera fusion baseline framework for end-to-end autonomous driving, also developed by the Technical University of Munich [13][16]. - The framework utilizes sparse 3D feature alignment and Doppler velocity measurement techniques, achieving a 4.8% improvement in 3D detection mAP, an 8.3% increase in tracking AMOTA, a 4.0% reduction in motion prediction mADE, and a 0.11m decrease in trajectory planning L2 error [13][16]. - The overall radar-based fusion strategy significantly enhances performance across multiple tasks, including 3D detection, multi-object tracking, online mapping, and motion prediction [13][16]. - Comprehensive evaluations on open-loop nuScenes and closed-loop Bench2Drive benchmarks demonstrate its advantages in enhancing perception range, improving motion modeling accuracy, and robustness in adverse conditions [13][16].
又有很多自动驾驶工作中稿了ICCV 2025,我们发现了一些新趋势的变化...
自动驾驶之心· 2025-08-16 00:03
VLM & VLA:毋庸置疑,多模态大模型和VLA是今年最火的赛道,这两个月也有很多工作源源不断推出。 对于自动驾驶VLA来说,Action定义在自车轨迹这一层级,主流范式遵循『预训练-微调-强化学习』三阶段 的范式,通用自动驾驶VLM基础模型仍然缺失(自动驾驶视觉数据和大模型对齐),期待工业界持续的发 力; 闭环仿真 & 世界模型:基于世界模型和3DGS的闭环仿真方向也是另一大热点,无论是基于重建的方法还是 基于生成的方法,都可以应用于闭环仿真。受限于实车开环测试的局限性,闭环仿真也是这两年自动驾驶的 刚需。一方面是降成本,另一方面是模型迭代效率。据自动驾驶之心了解,业内多家公司都在攻坚闭环仿 真,但做的好的寥寥无几,当前的效果只能说是在『能用』的阶段,并且工业界聚焦在更细力度上的性能, 不止是整体的重建效果,比如车道线、待转区、红绿灯、车灯、行人步态等等会影响自车行为的因素都是实 际需要考虑的; OCC和检测:OCC和检测还是有很多工作中稿的,但不在局限于『刷性能』,我们看到了很多细分方向的工 作,OCC和世界模型、开集目标检测、检测+Mamba、OCC和Gaussion等等,这些相对成熟的方向正在深挖 ...
死磕技术的自动驾驶黄埔军校,4000人了!
自动驾驶之心· 2025-08-15 14:23
Core Viewpoint - The article emphasizes the establishment of a comprehensive community focused on autonomous driving, aiming to bridge the gap between academia and industry while providing valuable resources for learning and career opportunities in the field [2][16]. Group 1: Community and Resources - The community has created a closed-loop system covering various fields such as industry, academia, job seeking, and Q&A exchanges, enhancing the learning experience for participants [2][3]. - The platform offers cutting-edge academic content, industry roundtables, open-source code solutions, and timely job information, significantly reducing the time needed for research [3][16]. - Members can access nearly 40 technical routes, including industry applications, VLA benchmarks, and entry-level learning paths, catering to both beginners and advanced researchers [3][16]. Group 2: Learning and Development - The community provides a well-structured learning path for beginners, including foundational knowledge in mathematics, computer vision, deep learning, and programming [10][12]. - For those already engaged in research, valuable industry frameworks and project proposals are available to further their understanding and application of autonomous driving technologies [12][14]. - Continuous job sharing and career opportunities are promoted within the community, fostering a complete ecosystem for autonomous driving [14][16]. Group 3: Technical Focus Areas - The community has compiled extensive resources on various technical aspects of autonomous driving, including perception, simulation, planning, and control [16][17]. - Specific learning routes are available for topics such as end-to-end learning, 3DGS principles, and multi-modal large models, ensuring comprehensive coverage of the field [16][17]. - The platform also features a collection of open-source projects and datasets relevant to autonomous driving, facilitating hands-on experience and practical application [32][34].
WeRide Secures Strategic Equity Investment from Grab, Partners to Deploy Robotaxis and Autonomous Shuttles in Southeast Asia
Globenewswire· 2025-08-15 09:18
Core Insights - WeRide has announced a strategic equity investment from Grab to accelerate the deployment of Level 4 Robotaxis and shuttles in Southeast Asia, aiming to integrate WeRide's autonomous vehicles into Grab's network for improved service and safety [2][3][5] Investment and Partnership - Grab's investment is expected to be finalized by the first half of 2026, contingent on customary closing conditions and WeRide's preferred timing, supporting WeRide's growth strategy in Southeast Asia [3] - This partnership builds on a prior Memorandum of Understanding signed in March 2025, focusing on the technical feasibility, commercial viability, and job creation potential of autonomous vehicles in the region [8] Operational Integration - The collaboration will establish a framework for deploying autonomous solutions across Grab's network, enhancing operational efficiency and scalability [4] - WeRide will integrate its autonomous driving technology into Grab's fleet management, vehicle matching, and routing ecosystem [4][12] Vision and Goals - WeRide aims to deploy thousands of Robotaxis in Southeast Asia, aligning with local regulations and societal readiness, leveraging Grab's regional expertise in ride-hailing and digital services [5] - Grab emphasizes the need for reliable transportation in Southeast Asia, particularly in areas with driver shortages, and plans to test WeRide's vehicles in diverse environments to adapt the technology for regional needs [6] Technical Collaboration - The partnership will focus on optimizing dispatch and routing, maximizing vehicle uptime, measuring safety performance, remote monitoring, customer support, and training for driver-partners and local communities [12]
多空博弈Robotaxi:“木头姐”建仓,机构现分歧
Di Yi Cai Jing· 2025-08-15 03:45
唱多、唱空交织,推动自动驾驶技术成熟。 今年以来,Robotaxi(自动驾驶出租车)受到全球资本市场广泛关注,但质疑声也如约而至。 近日,"木头姐"Cathie Wood旗下ARK基金斥资约1290万美元买入小马智行(NASDAQ:PONY)股 票,这是"木头姐"的主力基金首次持仓中国自动驾驶标的。据悉,"木头姐"被华尔街认为是"女版巴菲 特",其投资偏好是高成长、高风险及长期持有。 另一家中国Robotaxi头部企业文远知行(NASDAQ:WRD)二季度Robotaxi业务同比大增836.7%,该公 司早在今年5月就披露了Uber承诺向其追加投资1亿美元的事宜。 记者近期在广州体验百度旗下萝卜快跑Robotaxi时也出现"高峰期等车时间长达1个小时、且无车接 单"的情况。当记者问询叫车点附近运营车辆数量时,萝卜快跑客服回应称:"城市的可服务车辆并非固 定不变,会受多方因素影响进行动态调整。"根据附近居民、商户的反馈,下班高峰期萝卜快跑的等车 时长大于40分钟。 不可否认的是,现阶段Robotaxi派单时长、等车时长均较有人网约车更多,也是行业需要解决的课题。 韩旭表示,当自动驾驶公司开拓一个新城市时,自动驾 ...