Agibot

Search documents
跨形态学习来了!轮式机器人的“经验”如何轻松传给双足机器人?
机器人大讲堂· 2025-09-23 13:24
Core Insights - The article discusses the rapid advancements in humanoid robot technology, particularly focusing on the Visual-Language-Action (VLA) model systems that can perform various household tasks with high reliability and generalization capabilities. However, a significant bottleneck remains due to the lack of high-quality, comprehensive demonstration data for bipedal robots [1][20]. Group 1: TrajBooster Framework - The TrajBooster framework was proposed by research teams from Zhejiang University and Westlake University to address the challenge of data scarcity by utilizing rich operational data from wheeled robots and trajectory redirection technology to enhance the action learning efficiency of bipedal humanoid robots [1][20]. - The core idea of TrajBooster is to use the 6D end-effector trajectory (3D position + 3D rotation) as a universal interface, allowing for "cross-modal" teaching regardless of robot morphology [2][4]. Group 2: Process Overview - The process involves three main stages: 1. Source data extraction from large datasets of wheeled robots, including language instructions, multi-view visual observations, and corresponding 6D end-effector trajectories [4]. 2. Trajectory redirection in a simulated environment to teach the target bipedal robot how to coordinate its joints to follow these trajectories [4][5]. 3. Model training and fine-tuning using minimal real data from the target robot to deploy the model effectively in real-world scenarios [4][9]. Group 3: Model Architecture - The model architecture consists of a hierarchical control model that breaks down complex problems into manageable sub-problems, with an upper layer for inverse kinematics (IK) to control the arms and a lower layer for a hierarchical reinforcement learning (RL) strategy to manage the legs and balance [5][8]. - The management policy acts as a "decision brain" to determine how the robot should move to reach the target position, while the worker policy translates these commands into specific joint actions [8]. Group 4: Training Phases - The training process includes two phases: Post-Pre-Training (PPT) and Post-Training (PT). PPT combines redirected action data with source data to create a new dataset for further pre-training the VLA model, allowing it to understand the action space of the target robot [9][10]. - The PT phase involves collecting only 10 minutes of real remote operation data to fine-tune the model, bridging the gap between simulation and reality, thus significantly reducing data collection costs [11]. Group 5: Experimental Results - Experiments conducted on the Unitree G1 bipedal robot demonstrated that the model trained with PPT outperformed models trained solely on real data, achieving significant performance improvements in tasks such as "grabbing Mickey Mouse" and "organizing toys" [12][15]. - The model's ability to perform zero-shot skill transfer was highlighted, as it successfully completed tasks not seen during training, indicating effective skill inheritance through trajectory transfer [15][16]. - The model also showed enhanced trajectory generalization capabilities, achieving an 80% success rate in novel object placements compared to 0% for models not using PPT, demonstrating a deeper understanding of the action space [16].