Workflow
模块化跨实体Transformer(MXT)
icon
Search documents
Human2LocoMan:通过人类预训练学习多功能四足机器人操控
自动驾驶之心· 2025-07-04 10:27
Core Insights - The article presents a novel framework called Human2LocoMan for enhancing quadrupedal robots' manipulation capabilities through human pretraining, addressing the challenges of autonomous multi-functional operations in complex environments [5][9][38] - The framework utilizes a modular cross-entity transformer architecture (MXT) to facilitate effective data collection and transfer learning from human demonstrations to robotic strategies, demonstrating significant performance improvements in various tasks [10][36] Group 1: Framework and Methodology - The Human2LocoMan framework integrates remote operation and data collection systems to bridge the action space between humans and quadrupedal robots, enabling efficient acquisition of high-quality datasets [9][38] - The system employs extended reality (XR) technology to capture human actions and translate them into robotic movements, enhancing the robot's workspace and perception capabilities [9][12] - A modular design in the MXT architecture allows for the sharing of a common transformer backbone while maintaining entity-specific markers, facilitating effective strategy transfer across different robotic entities [16][37] Group 2: Experimental Results - Experiments conducted on six challenging household tasks showed an average success rate improvement of 41.9% and an 82.7% increase in out-of-distribution (OOD) scenarios when using human data for pretraining [6][10] - The framework demonstrated robust generalization capabilities, maintaining high performance even with limited robotic data, and significantly improving task execution in both ID and OOD scenarios [37][38] - The modular design of MXT was shown to outperform traditional methods, indicating its effectiveness in leveraging human data for enhanced robotic learning and performance [33][36] Group 3: Data Collection and Efficiency - The Human2LocoMan system allows for efficient data collection, achieving over 50 robotic trajectories and 200 human trajectories within 30 minutes, showcasing its potential for rapid data acquisition in complex tasks [30] - The framework supports a variety of operation modes, including single and dual-hand tasks, and is adaptable to different object types and scenarios, enhancing its applicability across various domains [30][36]