统一高效VLA+RL训练平台RLinf-VLA！

Core Insights - The article discusses the launch of RLinf, a large-scale reinforcement learning framework aimed at embodied intelligence, highlighting its flexibility and efficiency in system design [2][3]. Group 1: System Design - RLinf-VLA provides a unified and efficient platform for VLA+RL research, achieving a throughput improvement of 2.27 times compared to baseline platforms [2][5]. - It supports multiple simulators (LIBERO and ManiSkill), allowing for integrated training across different environments [5]. - The system allows for easy switching between various VLA models and RL algorithms, reducing the workload for model adaptation [5]. Group 2: Performance Overview - A single unified model achieved a success rate of 98.11% across 130 tasks in LIBERO and 97.66% in 25 pick & place tasks in ManiSkill [6]. - The RLinf-VLA framework demonstrates superior zero-shot generalization capabilities when deployed on real robotic systems compared to strategies trained with SFT [6][45]. Group 3: Algorithm Design - The framework introduces several design optimizations, including lightweight critics and trajectory length normalization, which significantly enhance training efficiency [9][21][25]. - It supports three levels of output granularity (token-level, action-level, chunk-level) for both advantage and log-probability calculations, allowing for flexible training strategies [12][14][22]. Group 4: Experimental Results - In multi-task experiments, the OpenVLA model showed performance improvements of 45% to 70% over baseline models in ManiSkill tasks [31]. - The RLinf-VLA framework demonstrated high efficiency in training, with significant reductions in training time compared to baseline methods [43][44]. Group 5: Real-World Application - The RLinf-VLA framework was successfully deployed on the Franka Panda robotic arm, showcasing its ability to generalize from simulation to real-world tasks [45].