Data Scaling Law
Search documents
解决特斯拉「监督稀疏」难题,用世界模型放大自动驾驶的Scaling Law
具身智能之心· 2025-11-20 00:03
Core Insights - The article discusses the challenges faced by VLA models in autonomous driving, particularly the issue of "supervision deficit" due to sparse supervisory signals compared to high-dimensional visual input [3][7][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution by introducing world models to provide dense self-supervised signals, enhancing the model's learning capabilities [3][9][16] Group 1: Supervision Deficit - VLA models struggle with a "supervision deficit," where the input is dense visual information but the supervisory signals are sparse, leading to wasted representational capacity [7][8] - The research indicates that performance of VLA models saturates quickly with increased data under sparse supervision, diminishing the effects of Data Scaling Law [8][22] Group 2: Solution through World Models - The proposed solution involves using world models to generate dense self-supervised training tasks, such as predicting future images, which compels the model to learn the dynamics of the environment [10][14][15] - This approach provides richer learning signals compared to relying solely on sparse action supervision, effectively addressing the supervision deficit [15][16] Group 3: Amplification of Data Scaling Law - The core contribution of the research is the discovery that world models can significantly amplify the effects of Data Scaling Law, leading to better performance as data scales up [17][21] - Experimental results show that DriveVLA-W0 outperforms baseline models, with a notable performance improvement as data increases, particularly at scales from 700K to 70M frames [21][23] Group 4: Performance and Efficiency - DriveVLA-W0 is designed to be practical, addressing the high latency issues in VLA models by introducing a lightweight MoE "action expert" architecture, reducing inference latency to 63.1% of the baseline VLA [26][27] - The integration of world models resulted in a 20.4% reduction in collision rates at 70M frames, demonstrating a qualitative improvement beyond merely increasing action data [24][29]
解决特斯拉「监督稀疏」难题,DriveVLA-W0用世界模型放大自动驾驶Data Scaling Law
机器之心· 2025-11-17 04:23
Core Insights - The article discusses the transition of VLA models in autonomous driving from academic research to practical applications, highlighting the challenge of "supervision deficit" [2][5][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution to this challenge by introducing world models as a means to provide dense self-supervised signals [6][10][12] Group 1: Supervision Deficit - VLA models face a "supervision deficit" where high-dimensional visual input is paired with low-dimensional sparse supervisory signals, leading to wasted representational capacity [8][9] - The research team found that performance of VLA models saturates quickly with increased data under sparse action supervision, diminishing the effects of Data Scaling Law [9][22] Group 2: World Models as a Solution - The introduction of world models allows the model to predict future images, providing a richer and denser learning signal compared to relying solely on sparse actions [11][15][16] - This approach fundamentally alleviates the supervision deficit issue, enabling better learning of complex dynamics in driving environments [16][18] Group 3: Amplifying Data Scaling Law - The core contribution of the research is the discovery that world models significantly amplify the effects of Data Scaling Law, showing a steeper performance improvement with increased data compared to baseline models [18][21] - In experiments with up to 70 million frames, the world model reduced collision rates by 20.4%, demonstrating a qualitative leap in performance that surpasses merely stacking action data [24] Group 4: Efficiency and Real-World Application - The research also addresses the high latency issue in VLA models by proposing a lightweight MoE "action expert" architecture, which reduces inference latency to 63.1% of the baseline VLA without sacrificing performance [26][27] - This design enhances the feasibility of real-time deployment of VLA models in autonomous driving applications [27][29]