动作迁移
Search documents
清北推出Motion Transfer,机器人直接从人类数据中端到端学习技能
具身智能之心· 2025-11-07 00:05
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) for transferring skills between different robots without retraining [1][2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new framework called MotionTrans, which enables zero-shot skill transfer from humans to robots using VR data [2][4] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot, multi-task skill transfer framework that allows robots to learn human skills directly from VR data without prior demonstrations [4][7] - The framework supports zero-shot transfer, meaning robots can learn tasks like pouring water and plugging/unplugging devices solely from human data collected via VR [7][16] - It also allows for fine-tuning with a small number of robot data samples (5-20), significantly improving success rates for 13 human skills [7][17] Technical Details - The MotionTrans framework is designed to be architecture-agnostic, allowing it to be integrated with popular models like Diffusion Policy and VLA [7][10] - The team developed a human data collection system that captures first-person video, head movement, wrist poses, and hand actions, which are then transformed into a format suitable for robots [9][10] - The framework employs techniques like coordinate transformation and hand retargeting to bridge the gap between human and robot actions [10][11] Performance Evaluation - In zero-shot evaluations, the robot achieved an average success rate of 20% across 13 tasks, with some tasks like Pick-and-Place reaching success rates of 60%-80% [14][16] - After fine-tuning with a small number of robot trajectories, the average success rate improved to approximately 50% with 5 trajectories and up to 80% with 20 trajectories [17][18] - The results indicate that even tasks with initially zero success rates showed the model could learn the correct action direction, demonstrating the framework's ability to capture task semantics [14][22] Conclusion - MotionTrans has proven that even advanced models can learn new skills under zero-robot demonstration conditions using only human VR data, changing the perception of human data from a supplementary role to a primary one in skill acquisition [22][23] - The team has open-sourced all data, code, and models to support further research in this area [23]