Workflow
MotionTrans
icon
Search documents
清北推出Motion Transfer,机器人直接从人类数据中端到端学习技能
具身智能之心· 2025-11-07 00:05
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) for transferring skills between different robots without retraining [1][2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new framework called MotionTrans, which enables zero-shot skill transfer from humans to robots using VR data [2][4] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot, multi-task skill transfer framework that allows robots to learn human skills directly from VR data without prior demonstrations [4][7] - The framework supports zero-shot transfer, meaning robots can learn tasks like pouring water and plugging/unplugging devices solely from human data collected via VR [7][16] - It also allows for fine-tuning with a small number of robot data samples (5-20), significantly improving success rates for 13 human skills [7][17] Technical Details - The MotionTrans framework is designed to be architecture-agnostic, allowing it to be integrated with popular models like Diffusion Policy and VLA [7][10] - The team developed a human data collection system that captures first-person video, head movement, wrist poses, and hand actions, which are then transformed into a format suitable for robots [9][10] - The framework employs techniques like coordinate transformation and hand retargeting to bridge the gap between human and robot actions [10][11] Performance Evaluation - In zero-shot evaluations, the robot achieved an average success rate of 20% across 13 tasks, with some tasks like Pick-and-Place reaching success rates of 60%-80% [14][16] - After fine-tuning with a small number of robot trajectories, the average success rate improved to approximately 50% with 5 trajectories and up to 80% with 20 trajectories [17][18] - The results indicate that even tasks with initially zero success rates showed the model could learn the correct action direction, demonstrating the framework's ability to capture task semantics [14][22] Conclusion - MotionTrans has proven that even advanced models can learn new skills under zero-robot demonstration conditions using only human VR data, changing the perception of human data from a supplementary role to a primary one in skill acquisition [22][23] - The team has open-sourced all data, code, and models to support further research in this area [23]
清北联合推出Motion Transfer,比肩Gemini Robotics,让机器人直接从人类数据中端到端学习技能
机器之心· 2025-11-05 04:15
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) which allows skill transfer between different robot forms without retraining [2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new paradigm for zero-shot action transfer from humans to robots, releasing a comprehensive technical report and open-source code [3] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot RGB-to-Action skill transfer framework that enables robots to learn human skills without prior demonstrations [8] - The framework includes a self-developed human data collection system using VR devices, capturing first-person videos, head movements, wrist poses, and hand actions [9] Implementation of MotionTrans - The framework allows for zero-shot transfer, enabling robots to learn tasks like pouring water and unplugging devices using only human VR data, achieving a 20% average success rate across 13 tasks [12][17] - Fine-tuning with a small number of robot data (5-20 samples) can increase the success rate to approximately 50% and 80%, respectively [20] Data and Training Techniques - The team utilized a large-scale human-robot dataset with over 3200 trajectories and 15 tasks, demonstrating the framework's ability to learn from human data alone [14][16] - The approach includes techniques like hand redirection and unified action normalization to bridge the gap between human and robot actions [10][13] Results and Contributions - MotionTrans has proven that even advanced end-to-end models can unlock new skills under zero-robot demonstration conditions, changing perceptions of human data from a supplementary role to a primary one [25] - The team has open-sourced all data, code, and models to support future research in this area [26]