Workflow
ProphRL 框架
icon
Search documents
碾压π0.5,复旦团队首创「世界模型+具身训练+强化学习」闭环框架
机器之心· 2025-12-04 08:18
Core Viewpoint - The Vision–Language–Action (VLA) strategy is becoming a crucial technological pathway for robots to achieve general operational intelligence, enabling simultaneous processing of visual perception, language instructions, and generation of continuous control signals [2]. Group 1: Challenges in Current VLA Approaches - Most current VLA methods rely heavily on imitation learning, which can lead to error accumulation and task failure when there are distribution shifts or changes in task forms [3][11]. - Implementing online reinforcement learning (RL) on real robots is costly and limited by the need for extensive human intervention and monitoring, making large-scale deployment impractical [12]. - Traditional physics engines struggle to balance realism, scene diversity, and engineering usability, complicating the use of RL in simulated environments [13]. Group 2: ProphRL Framework - The research team proposed the ProphRL framework, utilizing a large-scale pre-trained world model called Prophet as a video-level simulator to optimize VLA strategies through online RL algorithms [4]. - This approach allows for significant reductions in real-world interaction costs while maintaining physical credibility, facilitating the practical implementation of large model VLA strategies [4]. Group 3: Experimental Results - ProphRL demonstrated a success rate improvement of 5–17% across various VLA models in public benchmarks, with real robot experiments showing a substantial success rate increase of 24–30% [8]. - The Prophet model achieved leading performance in visual fidelity and action consistency across multiple datasets, showcasing its ability to generalize across new scenes and tasks with minimal fine-tuning [31]. Group 4: Innovations in RL Algorithms - The research introduced FA-GRPO and FlowScale, RL algorithms tailored for flow-based action heads, enhancing training stability and performance by reorganizing gradient signals and balancing contributions from different steps [26][27]. - A video-language reward model was developed to evaluate task success based on the entire trajectory, moving away from manually designed geometric distances [26]. Group 5: Real-World Validation - The ProphRL framework was validated on real robots, achieving significant improvements in task success rates across various complex tasks, indicating the effectiveness of the world model and RL integration in practical applications [38].