Workflow
ReWiND
icon
Search documents
一文尽览!2025年多篇VLA与RL融合的突破方向
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses a significant revolution in the field of robotic embodiment intelligence, focusing on the integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) to address core challenges in real-world robotic decision-making and task execution [2][58]. Summary by Sections GRAPE: Generalizing Robot Policy via Preference Alignment - The GRAPE framework enhances VLA model generalization and adaptability by aligning trajectories, decomposing tasks, and modeling preferences with flexible spatiotemporal constraints [5][6]. - GRAPE shows a 51.79% increase in success rates for seen tasks and a 58.20% increase for unseen tasks, while also reducing collision rates by 37.44% under safety objectives [8][9]. VLA-RL: Towards Masterful and General Robotic Manipulation - The VLA-RL framework addresses the failure of VLA models in out-of-distribution scenarios by utilizing trajectory-level RL expressions and fine-tuning reward models to handle sparse rewards [11][13]. - VLA-RL significantly improves performance on 40 challenging robotic tasks, demonstrating the potential for early reasoning expansion in robotic applications [15]. ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations - The ReWiND framework allows for task adaptation using pre-trained language-based reward functions, eliminating the need for new demonstrations for unseen tasks [18][19]. - ReWiND exhibits a 2.4 times improvement in reward generalization and a 5 times increase in new task adaptation efficiency compared to baseline methods [21]. ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy - ConRFT employs a two-phase reinforcement fine-tuning approach to stabilize VLA model performance during supervised learning [24][26]. - The method achieves a 96.3% success rate across eight practical tasks, improving performance by 144% compared to previous supervised learning methods [29]. RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning - RLDG enhances generalist policies by generating high-quality training data through reinforcement learning, addressing performance and generalization issues [33][34]. - The method shows a 40% increase in success rates for precise operation tasks, demonstrating improved adaptability to new tasks [39]. TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization - TGRPO introduces online reinforcement learning to VLA models, enhancing robustness and efficiency in policy learning [40][42]. - The method outperforms various baseline approaches in ten operational tasks, validating its effectiveness in improving VLA model adaptability [44]. Improving Vision-Language-Action Model with Online Reinforcement Learning - The iRe-VLAd framework optimizes VLA models through iterative reinforcement and supervised learning, addressing stability and computational challenges [45][47]. - The framework demonstrates effective performance improvements in interactive scenarios, providing a viable path for optimizing large VLA models [51]. Interactive Post-Training for Vision-Language-Action Models - RIPT-VLA offers a scalable, reinforcement learning-based interactive post-training approach to enhance VLA models in low-data environments [52][53]. - The method achieves a 97% success rate with minimal supervision, showcasing its robustness and adaptability across various tasks [57]. Conclusion - The eight studies represent a significant advancement in robotic intelligence, focusing on overcoming industry challenges such as strategy generalization and dynamic environment adaptation, with practical applications in home tasks, industrial assembly, and robotic manipulation [58].
一文尽览!2025年多篇VLA与RL融合的突破方向
具身智能之心· 2025-08-25 00:04
Core Viewpoint - The article discusses a significant revolution in the field of robotic embodied intelligence, focusing on the integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) to address core challenges in real-world robotic decision-making and task execution [2][57]. Group 1: GRAPE - The GRAPE framework enhances the generalization of robot policies through preference alignment, addressing the limitations of VLA models in task adaptability and generalization [4][5]. - GRAPE improves the success rate of in-domain tasks by 51.79% and out-of-domain tasks by 58.20%, while also reducing collision rates by 37.44% under safety objectives [7][8]. Group 2: VLA-RL - The VLA-RL framework utilizes trajectory-level RL expressions to model operation trajectories and fine-tune reward models to tackle sparse rewards, enhancing task performance and demonstrating signs of reasoning expansion [10][12]. - In evaluations across 40 challenging robotic tasks, VLA-RL significantly outperformed existing models, indicating its potential for scalable applications [14]. Group 3: ReWiND - The ReWiND framework allows for the adaptation of robot policies to unseen tasks using a pre-trained language-based reward function, improving generalization and sample efficiency without the need for new demonstrations [17][18]. - ReWiND shows a 2.4 times improvement in reward generalization and a 5 times increase in performance for pre-trained dual-arm strategies in real-world scenarios [20]. Group 4: ConRFT - The ConRFT method employs a two-phase reinforcement fine-tuning approach to stabilize the supervision of VLA models, significantly increasing the success rate of practical tasks to 96.3% with a 144% improvement over previous methods [23][28]. - The model requires only 45 to 90 minutes of online fine-tuning to achieve these results, demonstrating its efficiency [28]. Group 5: RLDG - The RLDG method enhances the performance of generalist robot policies by generating high-quality training data through reinforcement learning, addressing the limitations of human demonstration data [32][33]. - In practical experiments, RLDG achieved a 40% increase in success rates for precise operation tasks, showcasing its effectiveness in improving generalization capabilities [38]. Group 6: TGRPO - The TGRPO method integrates trajectory-level group relative policy optimization to enhance the robustness and efficiency of VLA model fine-tuning in new environments [39][43]. - TGRPO consistently outperformed various baseline methods across ten operational tasks, validating its effectiveness in improving VLA model adaptability [43]. Group 7: iRe-VLAd - The iRe-VLAd framework optimizes VLA models through iterative reinforcement and supervised learning, addressing the instability and computational burden of direct online RL applications [44][48]. - This approach has been validated in multiple simulated and real-world scenarios, proving its capability to enhance performance in interactive settings [50]. Group 8: RIPT-VLA - The RIPT-VLA method introduces interactive post-training for VLA models, utilizing sparse binary success rewards to improve adaptability in low-data environments [51][54]. - This framework has shown significant improvements in compatibility, efficiency, and generalization, achieving a 97% success rate with minimal supervision [56]. Conclusion - The eight studies collectively represent a pivotal advancement in robotic intelligence, focusing on overcoming industry challenges such as task generalization, adaptability to dynamic environments, and multimodal information integration, with practical applications in home automation, industrial assembly, and robotic manipulation [57].