Core Insights - The article discusses the challenges faced by VLA models in autonomous driving, particularly the issue of "supervision deficit" due to sparse supervisory signals compared to high-dimensional visual input [3][7][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution by introducing world models to provide dense self-supervised signals, enhancing the model's learning capabilities [3][9][16] Group 1: Supervision Deficit - VLA models struggle with a "supervision deficit," where the input is dense visual information but the supervisory signals are sparse, leading to wasted representational capacity [7][8] - The research indicates that performance of VLA models saturates quickly with increased data under sparse supervision, diminishing the effects of Data Scaling Law [8][22] Group 2: Solution through World Models - The proposed solution involves using world models to generate dense self-supervised training tasks, such as predicting future images, which compels the model to learn the dynamics of the environment [10][14][15] - This approach provides richer learning signals compared to relying solely on sparse action supervision, effectively addressing the supervision deficit [15][16] Group 3: Amplification of Data Scaling Law - The core contribution of the research is the discovery that world models can significantly amplify the effects of Data Scaling Law, leading to better performance as data scales up [17][21] - Experimental results show that DriveVLA-W0 outperforms baseline models, with a notable performance improvement as data increases, particularly at scales from 700K to 70M frames [21][23] Group 4: Performance and Efficiency - DriveVLA-W0 is designed to be practical, addressing the high latency issues in VLA models by introducing a lightweight MoE "action expert" architecture, reducing inference latency to 63.1% of the baseline VLA [26][27] - The integration of world models resulted in a 20.4% reduction in collision rates at 70M frames, demonstrating a qualitative improvement beyond merely increasing action data [24][29]
解决特斯拉「监督稀疏」难题,用世界模型放大自动驾驶的Scaling Law