Dream4Drive
Search documents
Dream4Drive:一个能够提升下游感知性能的世界模型生成框架
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses the development of Dream4Drive, a new synthetic data generation framework aimed at enhancing downstream perception tasks in autonomous driving, emphasizing the importance of high-quality, controllable multimodal video generation [1][2][5]. Group 1: Background and Motivation - 3D perception tasks like object detection and tracking are critical for decision-making in autonomous driving, but their performance heavily relies on large-scale, manually annotated datasets [4]. - Existing methods for synthetic data generation often overlook the evaluation of downstream perception tasks, leading to a misrepresentation of the effectiveness of synthetic data [5][6]. - The need for diverse and extreme scenario data is highlighted, as current data collection methods are time-consuming and labor-intensive [4]. Group 2: Dream4Drive Framework - Dream4Drive decomposes input videos into multiple 3D-aware guidance maps, rendering 3D assets onto these maps to generate edited, multi-view realistic videos for training perception models [1][9]. - The framework utilizes a large-scale 3D asset dataset, DriveObj3D, which includes typical categories from driving scenarios, supporting diverse 3D perception video editing [2][9]. - Experiments show that Dream4Drive can significantly enhance perception model performance with only 420 synthetic samples, which is less than 2% of the real sample size [6][27]. Group 3: Experimental Results - The article presents comparative results demonstrating that Dream4Drive outperforms existing models in various training epochs, achieving higher mean Average Precision (mAP) and nuScenes Detection Score (NDS) [27][28]. - High-resolution synthetic data (512×768) leads to significant performance improvements, with mAP increasing by 4.6 percentage points (12.7%) and NDS by 4.1 percentage points (8.6%) [29][30]. - The findings indicate that the position of inserted assets affects performance, with distant insertions generally yielding better results due to reduced occlusion issues [37][38]. Group 4: Conclusions and Implications - The study concludes that existing evaluations of synthetic data in autonomous driving are biased, and Dream4Drive provides a more effective approach for generating high-quality synthetic data for perception tasks [40][42]. - The results emphasize the importance of using assets that match the style of the dataset to minimize the domain gap between synthetic and real data, enhancing model training [42].