Spatial-Temporal Reasoning

Search documents
自动驾驶论文速递 | 视觉重建、RV融合、推理、VLM等
自动驾驶之心· 2025-08-16 09:43
Core Insights - The article discusses two innovative approaches in autonomous driving technology: Dream-to-Recon for monocular 3D scene reconstruction and SpaRC-AD for radar-camera fusion in end-to-end autonomous driving [2][13]. Group 1: Dream-to-Recon - Dream-to-Recon is a method developed by the Technical University of Munich that enables monocular 3D scene reconstruction using only a single image for training [2][6]. - The method integrates a pre-trained diffusion model with a deep network through a three-stage framework: 1. View Completion Model (VCM) enhances occlusion filling and image distortion correction, achieving a PSNR of 23.9 [2][6]. 2. Synthetic Occupancy Field (SOF) constructs dense 3D scene geometry from multiple synthetic views, with occlusion reconstruction accuracy (IE_acc) reaching 72%-73%, surpassing multi-view supervised methods by 2%-10% [2][6]. 3. A lightweight distilled model converts generated geometry into a real-time inference network, achieving overall accuracy (O_acc) of 90%-97% on KITTI-360/Waymo, with a 70x speed improvement (75ms/frame) [2][6]. - The method provides a new paradigm for efficient 3D perception in autonomous driving and robotics without complex sensor calibration [2][6]. Group 2: SpaRC-AD - SpaRC-AD is the first radar-camera fusion baseline framework for end-to-end autonomous driving, also developed by the Technical University of Munich [13][16]. - The framework utilizes sparse 3D feature alignment and Doppler velocity measurement techniques, achieving a 4.8% improvement in 3D detection mAP, an 8.3% increase in tracking AMOTA, a 4.0% reduction in motion prediction mADE, and a 0.11m decrease in trajectory planning L2 error [13][16]. - The overall radar-based fusion strategy significantly enhances performance across multiple tasks, including 3D detection, multi-object tracking, online mapping, and motion prediction [13][16]. - Comprehensive evaluations on open-loop nuScenes and closed-loop Bench2Drive benchmarks demonstrate its advantages in enhancing perception range, improving motion modeling accuracy, and robustness in adverse conditions [13][16].