缺数据也能拿SOTA？清华&上海AI Lab破解机器人RL两大瓶颈

Core Viewpoint - The article discusses the development of SimpleVLA-RL, an end-to-end online training solution for Visual-Language-Action (VLA) models, aimed at enhancing the flexibility and performance of robots in complex environments while addressing existing training bottlenecks [3][12]. Group 1: Key Challenges in Existing Training Paradigms - Current training paradigms face significant challenges, including high data collection costs and insufficient generalization capabilities [2][8]. - The reliance on large-scale, high-quality robot operation trajectories limits scalability and increases costs, making data acquisition a major hurdle [8]. - The models struggle with generalization, particularly in out-of-distribution tasks and new environments, leading to performance drops in long-sequence dependencies and combinatorial tasks [8][9]. Group 2: SimpleVLA-RL Framework - SimpleVLA-RL employs a combination of interactive trajectory sampling, result-based rewards, and enhanced exploration to tackle the three core challenges of VLA model training [5][6]. - The framework demonstrates state-of-the-art (SoTA) performance in standard benchmarks like LIBERO and RoboTwin, achieving significant improvements even with limited data [5][21]. - In scenarios with single demonstration data, the average success rate in LIBERO increased from 48.9% to 96.9% after applying SimpleVLA-RL [5]. Group 3: Performance Metrics and Results - SimpleVLA-RL achieved an average success rate of 99.1% in LIBERO, with long-sequence tasks improving by 12.0 percentage points [21]. - In RoboTwin1.0, the average success rate rose from 39.8% to 70.4%, with specific tasks like "Blocks Stack" improving by 33.1 percentage points [23]. - The framework also demonstrated a significant increase in performance in RoboTwin2.0, with average success rates improving from 38.3% to 68.8% [25]. Group 4: Innovations and Discoveries - The training process led to the emergence of new operational strategies, such as the "Pushcut" phenomenon, where the model autonomously discovers more efficient methods beyond human demonstrations [10][31]. - This phenomenon indicates that reinforcement learning can enable VLA models to surpass the limitations of human demonstration patterns, paving the way for future adaptive VLA model development [31].