推切(Pushcut)现象

Search documents
SimpleVLA-RL:突破 VLA 模型训练瓶颈,RL实现端到端在线训练
自动驾驶之心· 2025-09-15 03:56
Core Insights - The article discusses the development of the SimpleVLA-RL framework, which enhances the training of Visual-Language-Action (VLA) models in robotics through reinforcement learning (RL) techniques, addressing key challenges in data scarcity and generalization capabilities [3][4][6]. Group 1: Research Background and Core Issues - VLA models are crucial for robotic manipulation, integrating visual perception, language understanding, and action generation, but current training methods face two main bottlenecks: data scarcity and weak generalization [4][6]. - The traditional training process relies heavily on large-scale human operation data, which is costly and difficult to scale, limiting model scalability [4][6]. - The article raises the question of whether RL can enhance the long-term action planning capabilities of VLA models, despite the unique challenges posed by VLA applications [4][6]. Group 2: SimpleVLA-RL Framework Contributions - SimpleVLA-RL is designed to improve VLA training efficiency, particularly in data-scarce environments, and has achieved state-of-the-art (SOTA) performance in benchmark tests like LIBERO and RoboTwin [7][8]. - The framework incorporates interactive trajectory sampling, parallel training across multiple environments, and a unified design for training, inference, and rendering, addressing the slow interaction and high cost issues of VLA models [7][8]. - It has demonstrated significant improvements in success rates across various tasks, such as increasing LIBERO's average success rate from 91.0% to 99.1% and RoboTwin 2.0 from 38.3% to 68.8% [7][8][14]. Group 3: Data Efficiency and Generalization - SimpleVLA-RL significantly reduces the dependency on large-scale demonstration data, achieving an average success rate of 96.9% with only one trajectory of demonstration data, surpassing the performance of full-trajectory supervised fine-tuning [19][20]. - The framework enhances the model's robustness across different scenes, objects, and tasks, demonstrating improved performance in unseen tasks compared to traditional methods [21][24]. Group 4: Real-World Deployment and Innovations - The framework has shown effective Sim-to-Real transfer, with real-world task success rates improving from 17.5% to 38.5% using only simulated data for training [24][27]. - A notable discovery is the "Pushcut" phenomenon, where the RL-trained model autonomously discovers more efficient strategies beyond human demonstrations, indicating a potential for innovative behavior in VLA models [25][30]. Group 5: Summary and Conclusions - SimpleVLA-RL addresses three core issues in VLA model training: reducing reliance on large-scale demonstration data, enhancing generalization capabilities, and achieving efficient Sim-to-Real transfer [31][32]. - The findings suggest that RL can enable VLA models to explore superior strategies, paving the way for future developments in autonomous and adaptive robotic systems [31][32].