卡耐基梅隆大学！Human2LocoMan：通过人类预训练学习多功能四足机器人操控

Core Insights - The article presents a novel framework called Human2LocoMan for enhancing quadrupedal robot manipulation through human pretraining, addressing the challenges of autonomous multi-functional operations in complex environments [4][38] - The framework utilizes a modular cross-entity Transformer architecture (MXT) to facilitate effective data collection and transfer learning from human demonstrations to robotic strategies [8][38] Group 1: Framework and Methodology - The Human2LocoMan framework integrates human data collection via extended reality (XR) technology, allowing for the mapping of human actions to robotic movements, thereby enhancing the robot's operational capabilities [7][10] - A unified reference framework is established to align actions between humans and the LocoMan robot, addressing the significant differences in dynamics and control systems between the two entities [12][10] - The MXT architecture is designed to share a common Transformer backbone while maintaining entity-specific markers, enabling effective transfer learning across different robotic platforms [16][8] Group 2: Experimental Results - The experiments demonstrated an average success rate improvement of 41.9% and an 79.7% enhancement in out-of-distribution (OOD) scenarios when using the proposed framework compared to baseline methods [4][8] - Pretraining with human data resulted in a 38.6% overall success rate increase and an 82.7% improvement in OOD scenarios, showcasing the effectiveness of human data in enhancing robotic performance [8][38] - The data collection efficiency was highlighted, with over 50 robot trajectories and 200 human trajectories collected within 30 minutes, indicating the framework's potential for rapid data acquisition [26][38] Group 3: Comparative Analysis - The MXT architecture outperformed state-of-the-art (SOTA) imitation learning methods in various tasks, demonstrating superior success rates and task scores, particularly in scenarios with limited data [30][34] - The modular design of MXT facilitated better generalization and reduced overfitting compared to other architectures, such as HPT, which struggled with severe overfitting issues [36][39] - The framework's ability to maintain high performance in long-sequence tasks indicates its robustness and effectiveness in real-world applications [36][38]