Workflow
Autonomous Driving
icon
Search documents
都在做端到端了,轨迹预测还有出路么?
自动驾驶之心· 2025-08-19 03:35
Core Viewpoint - The article emphasizes the importance of trajectory prediction in the context of autonomous driving and highlights the ongoing relevance of traditional two-stage and modular methods despite the rise of end-to-end approaches. It discusses the integration of trajectory prediction models with perception models as a form of end-to-end training, indicating a significant area of research and application in the industry [1][2]. Group 1: Trajectory Prediction Methods - The article introduces the concept of multi-agent trajectory prediction, which aims to forecast future movements based on the historical trajectories of multiple interacting agents. This is crucial for applications in autonomous driving, intelligent monitoring, and robotic navigation [1]. - It discusses the challenges of predicting human behavior due to its uncertainty and multimodality, noting that traditional methods often rely on recurrent neural networks, convolutional networks, or graph neural networks for social interaction modeling [1]. - The article highlights the advancements in diffusion models for trajectory prediction, showcasing models like Leapfrog Diffusion Model (LED) and Mixed Gaussian Flow (MGF) that have significantly improved accuracy and efficiency in various datasets [2]. Group 2: Course Objectives and Structure - The course aims to provide a systematic understanding of trajectory prediction and diffusion models, helping participants to integrate theoretical knowledge with practical coding skills, ultimately leading to the development of new models and research papers [6][8]. - It is designed for individuals at various academic levels who are interested in trajectory prediction and autonomous driving, offering insights into cutting-edge research and algorithm design [8]. - Participants will gain access to classic and cutting-edge papers, coding implementations, and methodologies for writing and submitting research papers [8][9]. Group 3: Course Highlights and Requirements - The course features a "2+1" teaching model with experienced instructors and dedicated support staff to enhance the learning experience [16][17]. - It requires participants to have a foundational understanding of deep learning and proficiency in Python and PyTorch, ensuring they can engage with the course material effectively [10]. - The course structure includes a comprehensive curriculum covering data sets, baseline codes, and essential research papers, facilitating a thorough understanding of trajectory prediction techniques [20][21][23].
自动驾驶秋招交流群成立了!
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, indicating a shift from numerous diverse approaches to a more unified model, which raises the technical barriers in the industry [1] Group 1 - The industry is witnessing a trend where previously many directions requiring algorithm engineers are now consolidating into unified models such as one model, VLM, and VLA [1] - The article encourages the establishment of a large community to support individuals in the industry, highlighting the limitations of individual efforts [1] - A new job and industry-related community is being launched to facilitate discussions on industry trends, company developments, product research, and job opportunities [1]
性能暴涨4%!CBDES MoE:MoE焕发BEV第二春,性能直接SOTA(清华&帝国理工)
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article discusses the CBDES MoE framework, a novel modular expert mixture architecture designed for BEV perception in autonomous driving, addressing challenges in adaptability, modeling capacity, and generalization in existing methods [2][5][48]. Group 1: Introduction and Background - The rapid development of autonomous driving technology has made 3D perception essential for building safe and reliable driving systems [5]. - Existing solutions often use fixed single backbone feature extractors, limiting adaptability to diverse driving environments [5][6]. - The MoE paradigm offers a new solution by enabling dynamic expert selection based on learned routing mechanisms, balancing computational efficiency and representational richness [6][9]. Group 2: CBDES MoE Framework - CBDES MoE integrates multiple structurally heterogeneous expert networks and employs a lightweight self-attention router (SAR) for dynamic expert path selection [3][12]. - The framework includes a multi-stage heterogeneous backbone design pool, enhancing scene adaptability and feature representation [14][17]. - The architecture allows for efficient, adaptive, and scalable 3D perception, outperforming strong single backbone baseline models in complex driving scenarios [12][14]. Group 3: Experimental Results - In experiments on the nuScenes dataset, CBDES MoE achieved a mean Average Precision (mAP) of 65.6 and a NuScenes Detection Score (NDS) of 69.8, surpassing all single expert baselines [37][39]. - The model demonstrated faster convergence and lower loss throughout training, indicating higher optimization stability and learning efficiency [39][40]. - The introduction of load balancing regularization significantly improved performance, with the mAP increasing from 63.4 to 65.6 when applied [42][46]. Group 4: Future Work and Limitations - Future research may explore patch-wise or region-aware routing for finer granularity in adaptability, as well as extending the method to multi-task scenarios [48]. - The current routing mechanism operates at the image level, which may limit its effectiveness in more complex environments [48].
Pony.ai Attracts Premium Capital as Funds Chase the Next Tech Transformation
Prnewswire· 2025-08-18 13:53
Core Insights - Leading investment management firms, including ARK Invest, have invested significantly in Pony.ai, marking a notable interest in the Chinese autonomous driving sector [1][2] - Pony.ai has reported substantial growth in robotaxi revenues and is on a clear path to profitability, attracting attention from major institutional investors [4][8] Investment Activity - ARK Invest invested approximately US$12.9 million in Pony.ai, marking its first investment in a Chinese firm focused on Level 4 autonomous driving technology [1] - At least 14 major global institutional investors backed Pony.ai in Q2, including Baillie Gifford and Nikko Asset Management, despite a general trend of U.S. investors moving away from Chinese assets [2] Market Potential - ARK's "Big Ideas 2025" report projects the ride-hailing market could reach US$10 trillion by 2030, with global robotaxi fleets potentially hitting around 50 million vehicles [3] - UBS analysts expect the robotaxi market value to reach US$183 billion in China and US$394 billion internationally by the late 2030s [9] Company Performance - Pony.ai reported a 158% year-on-year increase in robotaxi revenues in Q2, driven by the production of its seventh-generation robotaxi models [4] - The company aims to scale its fleet to 1,000 robotaxis by year-end, which is expected to achieve positive unit economics [5] Operational Efficiency - The Gen-7 vehicle has a 70% lower cost compared to its predecessor, with significant reductions in operational costs, including an 18% decrease in insurance costs [5] - Pony.ai has received commercial permits for fare-charging services in Shanghai and operates 24/7 in Guangzhou and Shenzhen [6][7] Analyst Sentiment - Following the Q2 earnings release, major institutions like Goldman Sachs and UBS rated Pony.ai's stock as "buy," with Goldman setting a price target of US$24.5, indicating a 54.5% upside [8]
What will it take for robotaxis to go global? | FT
Financial Times· 2025-08-18 04:00
Robo taxis are proving popular in cities like San Francisco, moving from concept to reality with the likes of Alphabet, Amazon, and Tesla all making significant investments in this space. Following the shuttering of General Motors cruise project, the US robo taxi market has fallen into the hands of just a few of the richest, most determined tech giants and a handful of startups bold enough to challenge them. Whimo, an autonomous driving tech company owned by Alphabet, Google's parent company, is now operati ...
文远知行获Grab数千万美元投资,加速在东南亚大规模部署Robotaxi
Sou Hu Cai Jing· 2025-08-18 01:40
Group 1 - WeRide, an autonomous driving technology company, announced a multi-million dollar equity investment from Southeast Asian super app platform Grab [1][3] - The investment is part of a strategic partnership aimed at accelerating the large-scale deployment of Level 4 Robotaxis and other autonomous vehicles in Southeast Asia [3] - The investment is expected to be completed by the first half of 2026, with the exact timing dependent on WeRide's chosen conditions [3] Group 2 - Grab's investment will support WeRide's international growth strategy, expanding its commercial autonomous vehicle fleet in Southeast Asia and promoting AI-driven mobility solutions [3] - WeRide's CEO, Han Xu, expressed the vision of gradually deploying thousands of Robotaxis in Southeast Asia, considering local regulations and societal acceptance [3] - The partnership leverages WeRide's advanced autonomous driving technology and operational experience alongside Grab's platform advantages to provide safe and efficient Robotaxi services [3]
自动驾驶VLA:OpenDriveVLA、AutoVLA
自动驾驶之心· 2025-08-18 01:32
Core Insights - The article discusses two significant papers, OpenDriveVLA and AutoVLA, which focus on applying large visual-language models (VLM) to end-to-end autonomous driving, highlighting their distinct technical paths and philosophies [22]. Group 1: OpenDriveVLA - OpenDriveVLA aims to address the "modal gap" in traditional VLMs when dealing with dynamic 3D driving environments, emphasizing the need for structured understanding of the 3D world [23]. - The methodology includes several key steps: 3D visual environment perception, visual-language hierarchical alignment, and a multi-stage training paradigm [24][25]. - The model utilizes structured, layered tokens (Agent, Map, Scene) to enhance the VLM's understanding of the environment, which helps mitigate spatial hallucination risks [6][9]. - OpenDriveVLA achieved state-of-the-art performance in the nuScenes open-loop planning benchmark, demonstrating its effective perception-based anchoring strategy [10][20]. Group 2: AutoVLA - AutoVLA focuses on integrating driving tasks into the native operation of VLMs, transforming them from scene narrators to genuine decision-makers [26]. - The methodology features layered visual token extraction, where the model creates discrete action codes instead of continuous coordinates, thus converting trajectory planning into a next-token prediction task [14][29]. - The model employs a dual-mode thinking approach, allowing it to adapt its reasoning depth based on scene complexity, balancing efficiency and effectiveness [28]. - AutoVLA's reinforcement learning fine-tuning (RFT) enhances its driving strategy, enabling the model to optimize its behavior actively rather than merely imitating human driving [30][35]. Group 3: Comparative Analysis - OpenDriveVLA emphasizes perception-language alignment to improve VLM's understanding of the 3D world, while AutoVLA focuses on language-decision integration to enhance VLM's decision-making capabilities [32]. - The two models represent complementary approaches: OpenDriveVLA provides a robust perception foundation, while AutoVLA optimizes decision-making strategies through reinforcement learning [34]. - Future models may combine the strengths of both approaches, utilizing OpenDriveVLA's structured perception and AutoVLA's action tokenization and reinforcement learning to create a powerful autonomous driving system [36].
成本降低14倍!DiffCP:基于扩散模型的协同感知压缩新范式~
自动驾驶之心· 2025-08-18 01:32
Core Viewpoint - The article introduces DiffCP, a novel collaborative perception framework that utilizes conditional diffusion models to significantly reduce communication costs while maintaining high performance in collaborative sensing tasks [3][4][20]. Group 1: Introduction to Collaborative Perception - Collaborative perception (CP) is emerging as a promising solution to address the inherent limitations of independent intelligent systems, particularly in challenging wireless communication environments [3]. - Current C-V2X systems face significant bandwidth limitations, making it difficult to support feature-level and raw data-level collaborative algorithms [3]. Group 2: DiffCP Framework - DiffCP is the first collaborative perception architecture that employs conditional diffusion models to capture geometric correlations and semantic differences for efficient data transmission [4]. - The framework integrates prior knowledge, geometric relationships, and received semantic features to reconstruct collaborative perception information, introducing a new paradigm based on generative models [4][5]. Group 3: Performance and Efficiency - Experimental results indicate that DiffCP achieves robust perception performance in ultra-low bandwidth scenarios, reducing communication costs by 14.5 times while maintaining state-of-the-art algorithm performance [4][20]. - DiffCP can be integrated into existing BEV-based collaborative algorithms for various downstream tasks, significantly lowering bandwidth requirements [4]. Group 4: Technical Implementation - The framework utilizes a pre-trained BEV-based perception algorithm to extract BEV features, embedding diffusion time steps, relative spatial positions, and semantic vectors as conditions [5]. - An iterative denoising process is employed, where the model integrates observations from the host vehicle with collaborative features to progressively recover original collaborative perception features [8]. Group 5: Application in 3D Object Detection - DiffCP was evaluated in a case study on 3D object detection, demonstrating its ability to achieve similar accuracy levels as state-of-the-art algorithms while reducing data rates by 14.5 times [20]. - The framework allows for adaptive data rates through variable semantic vector lengths, enhancing performance in challenging scenarios [20]. Group 6: Conclusion - DiffCP represents a significant advancement in collaborative perception, enabling efficient information compression and reconstruction for collaborative sensing tasks, thus facilitating the deployment of connected intelligent transportation systems in existing wireless communication frameworks [22].
你的2026届秋招进展怎么样了?
自动驾驶之心· 2025-08-16 16:04
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, indicating a shift from numerous diverse approaches to a more unified model, which raises the technical barriers in the industry [1] Group 1 - The industry is witnessing a trend where previously many directions requiring algorithm engineers are now consolidating into unified models such as one model, VLM, and VLA [1] - The article encourages the establishment of a large community to support individuals in the industry, highlighting the limitations of individual efforts [1] - A new job and industry-related community is being launched to facilitate discussions on industry trends, company developments, product research, and job opportunities [1]
自动驾驶论文速递 | 视觉重建、RV融合、推理、VLM等
自动驾驶之心· 2025-08-16 09:43
Core Insights - The article discusses two innovative approaches in autonomous driving technology: Dream-to-Recon for monocular 3D scene reconstruction and SpaRC-AD for radar-camera fusion in end-to-end autonomous driving [2][13]. Group 1: Dream-to-Recon - Dream-to-Recon is a method developed by the Technical University of Munich that enables monocular 3D scene reconstruction using only a single image for training [2][6]. - The method integrates a pre-trained diffusion model with a deep network through a three-stage framework: 1. View Completion Model (VCM) enhances occlusion filling and image distortion correction, achieving a PSNR of 23.9 [2][6]. 2. Synthetic Occupancy Field (SOF) constructs dense 3D scene geometry from multiple synthetic views, with occlusion reconstruction accuracy (IE_acc) reaching 72%-73%, surpassing multi-view supervised methods by 2%-10% [2][6]. 3. A lightweight distilled model converts generated geometry into a real-time inference network, achieving overall accuracy (O_acc) of 90%-97% on KITTI-360/Waymo, with a 70x speed improvement (75ms/frame) [2][6]. - The method provides a new paradigm for efficient 3D perception in autonomous driving and robotics without complex sensor calibration [2][6]. Group 2: SpaRC-AD - SpaRC-AD is the first radar-camera fusion baseline framework for end-to-end autonomous driving, also developed by the Technical University of Munich [13][16]. - The framework utilizes sparse 3D feature alignment and Doppler velocity measurement techniques, achieving a 4.8% improvement in 3D detection mAP, an 8.3% increase in tracking AMOTA, a 4.0% reduction in motion prediction mADE, and a 0.11m decrease in trajectory planning L2 error [13][16]. - The overall radar-based fusion strategy significantly enhances performance across multiple tasks, including 3D detection, multi-object tracking, online mapping, and motion prediction [13][16]. - Comprehensive evaluations on open-loop nuScenes and closed-loop Bench2Drive benchmarks demonstrate its advantages in enhancing perception range, improving motion modeling accuracy, and robustness in adverse conditions [13][16].