分层 RL-MPC 框架：让机器人 “懂几何、善接触” 的灵巧操作新范式

Core Insights - The article discusses the challenges faced by robots in dexterous manipulation, highlighting issues such as high data requirements, difficulties in virtual-to-real transfer, and weak generalization capabilities [2] - A new hierarchical RL-MPC framework inspired by human operation logic is proposed, achieving nearly 100% task success rate, 10 times data efficiency improvement, and zero-shot virtual-to-real transfer [2][4] Challenges in Traditional Dexterous Manipulation - Traditional approaches struggle to balance learning efficiency, robustness, and generalization, with three main issues identified: 1. End-to-end vision methods require massive data for learning non-smooth contact dynamics, leading to low efficiency in long-term tasks [3] 2. Motion strategies face significant gaps in performance across different object geometries and scenes [3] 3. Traditional model control lacks flexibility and adaptability in open environments with diverse object shapes [3] Innovations in the Hierarchical RL-MPC Framework - The framework's core innovation is the "Contact Intention," which serves as an interface connecting high-level decision-making and low-level execution, structured into three layers and two modules [4][6] - High-level RL focuses on predicting contact intentions based on scene observations, while low-level MPC specializes in executing contact dynamics [4][12] High-Level RL Strategy - The high-level RL strategy employs a three-component observation space that includes geometry, target, and collision information, enhancing the strategy's environmental awareness [7] - The framework uses indirect prediction of MPC weights to define sub-goals, improving learning efficiency by allowing flexible switching between sub-goals [8] - A dual-branch network architecture balances local details and global context, optimizing feature extraction for both [9] Low-Level MPC Execution - The low-level MPC utilizes a complementary free model predictive control (ComFree-MPC) to ensure stability and adaptability in contact actions, operating at a high frequency of 100Hz [12][16] - The optimization objectives are designed to strictly adhere to high-level intentions, ensuring quick responses to disturbances [17] Experimental Validation - The framework demonstrated strong performance in two non-prehensile tasks, achieving a success rate of 97.34% for unseen objects in a pushing task and 100% in 3D redirection tasks [20][24] - The data efficiency of the framework significantly outperformed end-to-end strategies, requiring only 15,000 RL decision steps to achieve 100% success rate compared to 600,000 steps for traditional methods [26] Robustness and Virtual-to-Real Transfer - The framework exhibited high robustness against various disturbances, maintaining performance while traditional methods failed under similar conditions [25][29] - The strategy was successfully deployed on real robots without any fine-tuning, achieving high success rates across various objects [30] Limitations and Future Directions - The framework currently relies on accurate pose estimation, which can lead to failures in real-world scenarios, indicating a need for integrated perception-planning-control designs [36] - There are challenges in scalability with multiple end-effectors, suggesting future work should focus on optimizing contact intention representation [36] Conclusion - The hierarchical RL-MPC framework represents a significant advancement in dexterous manipulation, effectively combining decision-making flexibility with execution stability, paving the way for broader applications in robotics [37]