AIR科研｜X-VLA重磅开源，全面刷新机器人基准性能记录

Core Insights - The article discusses the launch of the X-VLA model, a groundbreaking open-source model for embodied intelligence, achieving a significant milestone by completing a 120-minute autonomous folding task with only 0.9 billion parameters, setting a new performance benchmark in the field [3][19]. Group 1: Performance Breakthrough - X-VLA is the first model to achieve a fully open-source solution for long-duration autonomous tasks, overcoming challenges in complex autonomous operations [8]. - The model demonstrates superior efficiency with only 0.9 billion parameters while achieving state-of-the-art (SOTA) performance across five authoritative simulation benchmarks [8][19]. Group 2: Innovative Technology - The model introduces a Soft-Prompt mechanism to address the variability in robot platforms, enhancing adaptability and training efficiency [10]. - A multi-modal encoding strategy is employed to optimize resource allocation while ensuring information integrity during task execution [10]. - The action generation module utilizes flow-matching for probabilistic modeling of robot action sequences, improving trajectory smoothness and robustness in uncertain environments [10]. Group 3: Data Pre-training and Quality - A balanced data sampling strategy is implemented to ensure equitable training across heterogeneous datasets, preventing model bias [12]. - A rigorous data preprocessing pipeline enhances the temporal consistency and overall quality of state-action sequences [12]. - The model's training is guided by a semantic-action alignment standard, ensuring that the learned behaviors are causally related rather than superficial associations [12]. Group 4: Training Process and Techniques - The pre-training phase exhibits a linear growth trend in performance as model parameters and training data scale up, validating the effectiveness of the Soft-Prompt mechanism [15]. - During fine-tuning, X-VLA shows high data efficiency, requiring only small-scale task-specific data to achieve SOTA performance [16]. - Adaptive learning rate adjustments and a progressive warm-up strategy are employed to optimize training stability and efficiency [17]. Group 5: Experimental Results - X-VLA has demonstrated strong performance in real robot platforms, successfully completing various tasks, including unlimited-duration autonomous folding, showcasing its capability in complex long-range tasks [19].