视觉强≠能干活!清北普林斯顿等开源WorldArena,世界模型评测被颠覆
机器之心·2026-02-13 05:08

Core Insights - The article discusses the launch of WorldArena, a unified evaluation system for embodied world models, developed by leading institutions, aiming to shift the focus from visual quality to functional reliability in robotics [1][4][8]. Evaluation Framework - WorldArena introduces a six-dimensional visual assessment framework that includes visual quality, action quality, content consistency, physical adherence, 3D accuracy, and controllability, emphasizing the importance of physical understanding for robots [5][21][25]. - The system also incorporates three embodied tasks to evaluate whether models can effectively participate in real-world tasks, revealing that many visually high-scoring models perform poorly in practical applications [5][27]. EWMScore - EWMScore is a comprehensive scoring system that consolidates various evaluation metrics into a single score, showing a high correlation with human subjective assessments, thus providing a more accurate reflection of model capabilities [6][30][41]. - The correlation between EWMScore and task performance indicates that visual realism does not equate to functional reliability, highlighting a significant gap between visual generation and task execution capabilities [32][44]. Challenges and Future Directions - The article emphasizes that while world models have made significant strides in visual generation, they still face fundamental shortcomings in supporting embodied intelligence tasks and long-term decision-making [33][40]. - The conclusion stresses the need for world models to understand physical laws and maintain consistency in complex environments to transition from being mere visual models to functional embodied intelligence systems [41][45].

视觉强≠能干活!清北普林斯顿等开源WorldArena,世界模型评测被颠覆 - Reportify