Workflow
Generative World Models
icon
Search documents
十余所机构联合提出WorldLens:评测了所有开源自驾世界模型(中科院&新国立等)
自动驾驶之心· 2025-12-16 00:03
Core Insights - The article introduces WorldLens, a comprehensive benchmark for evaluating generative world models in driving scenarios, focusing on visual realism, geometric consistency, physical plausibility, and functional reliability [4][36] - WorldLens aims to address the lack of standardized evaluation methods in the field, providing a unified framework that connects objective measurements with human perception [4][36] Background Review - Generative world models have transformed AI and simulation, yet evaluation methods have not kept pace, leading to a lack of comparability in research results [4] - Existing metrics primarily focus on frame quality and aesthetic performance, failing to reflect physical causality and multi-view geometric consistency [4][36] WorldLens Overview - WorldLens evaluates generative models across five complementary dimensions: generation quality, reconstruction performance, instruction following, downstream task adaptability, and human preference [8][36] - The benchmark includes the WorldLens-26K dataset, which contains a large number of human-annotated videos with quantitative scores and textual descriptions [7][19] Evaluation Dimensions - **Generation Quality**: Assesses the model's ability to synthesize visually realistic, temporally stable, and semantically consistent scenes [9][11] - **Reconstruction Performance**: Evaluates the model's capability to reconstruct coherent 4D scenes from generated videos [12][24] - **Instruction Following**: Tests the ability of pre-trained planners to operate safely within the generated world [14][25] - **Downstream Task Adaptability**: Measures how well synthetic data supports training of downstream perception models [15][28] - **Human Preference**: Captures subjective assessments of visual realism, physical coherence, and behavioral safety through large-scale human annotations [15][30] Experimental Results Analysis - Current models show significant room for improvement in visual and temporal realism, with none achieving optimal performance across all dimensions [23][34] - The evaluation reveals that models with high perceptual scores may not perform well in downstream tasks, indicating the importance of aligning generated data with target domain distributions [34] Conclusion - WorldLens establishes a scalable and interpretable foundation for future benchmark testing of world models, guiding research towards systems that not only appear realistic but also behave reasonably [36]