500万次围观，1X把「世界模型」真正用在了机器人NEO身上

Core Viewpoint - The article discusses the advancements in the NEO home robot by 1X, particularly the introduction of the new "brain" called 1X World Model, which enables the robot to learn and perform tasks more autonomously by understanding the physical world through video pre-training [4][10]. Group 1: Technological Advancements - NEO has evolved from merely executing pre-programmed actions to being able to "imagine" tasks by generating a video in its mind before executing them [6][8]. - The 1X World Model (1XWM) integrates video pre-training to allow the robot to generalize across new objects, movements, and tasks without extensive prior data [13][24]. - The model utilizes a two-stage alignment process to convert video knowledge into actionable tasks, enhancing the robot's ability to perform in real-world scenarios [16][18]. Group 2: Training and Performance - 1XWM is built on a generative video model with 14 billion parameters, trained using a combination of detailed visual text annotations and human first-person perspective data [18][20]. - The training process includes a significant amount of human first-person video data, which improves the model's ability to understand and execute complex tasks [41]. - Experimental results indicate that NEO can perform tasks it has never encountered before, with high consistency between generated videos and actual task execution [26][30]. Group 3: Challenges and Improvements - Despite advancements, there are still challenges in executing tasks that require fine motor skills, such as pouring liquids or drawing [32]. - The quality of generated videos is linked to task success rates, prompting the team to explore methods for improving video generation quality to enhance task performance [34][41]. - The introduction of first-person data significantly boosts the model's performance in new and out-of-distribution tasks, although it may have limited effects on tasks already well-covered by existing data [42].