机器人灵巧操作
Search documents
分层 RL-MPC 框架:让机器人 “懂几何、善接触” 的灵巧操作新范式
具身智能之心· 2026-01-27 03:00
Core Insights - The article discusses the challenges faced by robots in dexterous manipulation, highlighting issues such as high data requirements, difficulties in virtual-to-real transfer, and weak generalization capabilities [2] - A new hierarchical RL-MPC framework inspired by human operation logic is proposed, achieving nearly 100% task success rate, 10 times data efficiency improvement, and zero-shot virtual-to-real transfer [2][4] Challenges in Traditional Dexterous Manipulation - Traditional approaches struggle to balance learning efficiency, robustness, and generalization, with three main issues identified: 1. End-to-end vision methods require massive data for learning non-smooth contact dynamics, leading to low efficiency in long-term tasks [3] 2. Motion strategies face significant gaps in performance across different object geometries and scenes [3] 3. Traditional model control lacks flexibility and adaptability in open environments with diverse object shapes [3] Innovations in the Hierarchical RL-MPC Framework - The framework's core innovation is the "Contact Intention," which serves as an interface connecting high-level decision-making and low-level execution, structured into three layers and two modules [4][6] - High-level RL focuses on predicting contact intentions based on scene observations, while low-level MPC specializes in executing contact dynamics [4][12] High-Level RL Strategy - The high-level RL strategy employs a three-component observation space that includes geometry, target, and collision information, enhancing the strategy's environmental awareness [7] - The framework uses indirect prediction of MPC weights to define sub-goals, improving learning efficiency by allowing flexible switching between sub-goals [8] - A dual-branch network architecture balances local details and global context, optimizing feature extraction for both [9] Low-Level MPC Execution - The low-level MPC utilizes a complementary free model predictive control (ComFree-MPC) to ensure stability and adaptability in contact actions, operating at a high frequency of 100Hz [12][16] - The optimization objectives are designed to strictly adhere to high-level intentions, ensuring quick responses to disturbances [17] Experimental Validation - The framework demonstrated strong performance in two non-prehensile tasks, achieving a success rate of 97.34% for unseen objects in a pushing task and 100% in 3D redirection tasks [20][24] - The data efficiency of the framework significantly outperformed end-to-end strategies, requiring only 15,000 RL decision steps to achieve 100% success rate compared to 600,000 steps for traditional methods [26] Robustness and Virtual-to-Real Transfer - The framework exhibited high robustness against various disturbances, maintaining performance while traditional methods failed under similar conditions [25][29] - The strategy was successfully deployed on real robots without any fine-tuning, achieving high success rates across various objects [30] Limitations and Future Directions - The framework currently relies on accurate pose estimation, which can lead to failures in real-world scenarios, indicating a need for integrated perception-planning-control designs [36] - There are challenges in scalability with multiple end-effectors, suggesting future work should focus on optimizing contact intention representation [36] Conclusion - The hierarchical RL-MPC framework represents a significant advancement in dexterous manipulation, effectively combining decision-making flexibility with execution stability, paving the way for broader applications in robotics [37]
只需少量演示即可灵活应对多样物体!阿米奥冯骞团队携低成本精准灵巧操作方案亮相IROS!
具身智能之心· 2025-10-20 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 先来看一段视频。 ★ 该项成果的一作为阿米奥联合创始人兼技术负责人冯骞,硕博均就读于德国慕尼黑工业大学,师从机 器人泰斗Alois Knoll,曾是思灵机器人早期员工、研究科学家。本次IROS2025,冯博将会在 Deep Learning in Grasping and Manipulation论坛上针对这项工作发表演讲。 机器人灵巧操作领域研究进展 领域痛点 机器人灵巧操作(如多手指抓取)是实现 "类人机器人" 的关键,但现有方案存在三大核心问题: 当机器人面对陌生物体, 如何靠少量演示、单视角观测就精准抓取? LensDFF ,这项由阿米奥机器人给出 了颠覆性方案—— 它跳出传统 "依赖多视角数据、额外训练对齐网络" 的思路,直接用语言特征作为 "语义 锚点",将 CLIP 提取的 2D 视觉特征,通过动态投影公式对齐到 3D 空间,从根源解决跨视角特征不一致 问题,且全程无需微调。 更关键的是,它把 5 种抓取原语(捏 / 钩 / 三脚架等)融入少样本演示,搭配 "法向量引导初始化 + 低维 eigengrasp 优化",让 DLR-HIT 灵巧手能 ...