全部超越了π0、π0.5！端到端全身VLA模型Lumo-1：迈进推理-行动闭环时代

Core Insights - The article discusses the advancements in robotics, particularly focusing on the Lumo-1 model developed by Stardust Intelligence, which aims to enhance robots' reasoning and action capabilities in complex environments [7][9][11]. Group 1: Lumo-1 Model Overview - Lumo-1 is an end-to-end VLA model designed to enable robots to perform tasks by understanding the intent behind actions rather than merely mimicking them [7]. - The model demonstrates superior operational intelligence and generalization capabilities, outperforming previous models like π0 and π0.5 in multi-step tasks and abstract reasoning [9][11]. Group 2: Training Phases of Lumo-1 - The training of Lumo-1 consists of three phases: 1. Embodied VLM pre-training on visual-language data to develop spatial understanding and trajectory inference [15]. 2. Cross-ontology joint training to enhance instruction following and spatial reasoning [16]. 3. Real-world reasoning-action training using the Astribot S1 robot to learn executable action patterns [16][18]. Group 3: Technical Innovations - Lumo-1 employs a Spatial Action Tokenizer (SAT) to model action spaces, allowing for the combination and reuse of actions like constructing sentences [19]. - The model integrates structured reasoning to form a reasoning chain that prioritizes understanding the "why" before the "how" of actions [23]. Group 4: Performance and Validation - Lumo-1 has shown significant improvements in various multimodal benchmarks, surpassing specialized models like RoboBrain-7B and Robix-7B [29]. - The model's ability to adapt to different environments and tasks, such as adjusting arm positions for varying container heights, highlights its robust generalization capabilities [29]. Group 5: Implications for the Industry - The findings suggest that data diversity in training (covering various scenes, objects, and instructions) is more critical for generalization than merely increasing data volume [28]. - The integration of reasoning and action in Lumo-1 indicates that enhancing reasoning capabilities does not compromise the model's action execution, presenting a new direction for robotics development [29].