Workflow
幂律规律
icon
Search documents
斯坦福、英伟达和伯克利提出具身Test-Time Scaling Law
机器之心· 2025-10-14 06:33
Core Insights - The article discusses the advancements in Vision-Language-Action (VLA) models, particularly focusing on the robustness and generalization capabilities in real-world applications through a "generate-and-verify" paradigm [2][5][20]. Group 1: Key Findings - The research team found that increasing the number of candidate actions during the inference phase leads to a continuous decrease in action errors for VLA models [5]. - A power law relationship was established between action errors and the number of Gaussian perturbations sampled, indicating that the robot control problem should be viewed as a combination of generating candidate actions and verifying them [5][20]. - The proposed Test-Time Scaling Law demonstrates predictable improvements in task success rates and stability as the sampling and verification scale increases [2][20]. Group 2: Methodology Overview - The first phase involves training an action verifier using a synthetic action preference dataset derived from the RMSE differences between candidate and ground truth actions [8]. - The second phase focuses on expanding computational resources during inference, utilizing the trained action verifier to enhance the stability of VLA models [9][12]. Group 3: Experimental Results - The integration of RoboMonkey with VLA models resulted in significant performance improvements, including a 25% increase in success rates for out-of-distribution tasks and a 9% increase in the in-distribution SIMPLER environment [17]. - The accuracy of the RoboMonkey verifier showed a log-linear growth with the expansion of the synthetic dataset, leading to enhanced performance in various environments [16]. Group 4: Practical Deployment - A dedicated VLA serving engine was implemented to support high-speed action resampling and efficient construction of action proposal distributions, optimizing inference costs [19]. - The system architecture allows for higher throughput with larger high-bandwidth memory, further enhancing the generalization capabilities of the robotic foundational models [19].