Workflow
泛化壁垒
icon
Search documents
王兴兴署名,宇树机器人春晚之后又进化了:单个策略就能学习各种极限动作
机器之心· 2026-03-03 08:14
Core Viewpoint - The article discusses the advancements in humanoid robot control technology, specifically focusing on the OmniXtreme framework developed by a collaboration of institutions, which enables robots to perform extreme movements with high precision and robustness [3][10]. Group 1: OmniXtreme Framework - OmniXtreme is the first general strategy capable of executing various extreme movements, including continuous flips and breakdancing, through a two-phase training process [3][10]. - The framework addresses the "generality barrier" in humanoid robot motion control, which has hindered performance as the diversity of action libraries increases [10][12]. - The first phase involves scalable pre-training using flow-based generative control policies, while the second phase employs actuation-aware residual reinforcement learning for fine-tuning [10][12][18]. Group 2: Training Process - The pre-training phase aims to equip the model with high representational capacity to master a wide range of extreme actions without the conservative tendencies seen in traditional multi-action reinforcement learning [12][13]. - The team integrated multiple high-quality action datasets and utilized the PPO algorithm to train expert policies, which were then distilled into a unified generative strategy [13][17]. - The post-training phase focuses on adapting the pre-trained strategy to real-world dynamics by introducing a lightweight residual policy that corrects actions against hardware constraints [18][19]. Group 3: Robustness and Performance - The team implemented aggressive domain randomization, increasing the range of initial pose noise and external force interference by up to 50%, enhancing the system's robustness [19]. - A power safety regularization mechanism was introduced to manage the high transient loads generated during dynamic actions, preventing hardware failures [20][22]. - The OmniXtreme framework demonstrated exceptional performance in real-world tests, achieving an overall success rate of 91.08% across various extreme actions, with specific success rates of 96.36% for flips and 93.33% for martial arts [30][29]. Group 4: Comparative Analysis - In comparative tests, OmniXtreme outperformed traditional reinforcement learning models, maintaining low tracking errors and high success rates even as the complexity of the action set increased [27][34]. - The framework's ability to scale effectively with model parameters was highlighted, showing significant improvements in tracking accuracy and robustness as the model size increased [37]. - The study revealed that traditional models suffered performance degradation with increased action diversity, while OmniXtreme maintained high success rates, challenging the notion that high fidelity must collapse with increased diversity [34].