机器人推理
Search documents
锦秋被投企业星尘智能自研Lumo-1模型:从推理-行动,看机器人如何秒变推理大师|Jinqiu Spotlight
锦秋集· 2025-12-11 06:20
Core Insights - The article discusses the advancements of Jinqiu Fund's portfolio company, Stardust Intelligence, in developing the Lumo-1 model, which aims to enhance robotic capabilities through end-to-end full-body VLA (Vision-Language-Action) modeling [2][11]. Group 1: Lumo-1 Model Overview - Lumo-1 integrates embodied VLM (Vision-Language Model) with cross-domain joint training and reasoning-action training, enabling robots to perform complex tasks with a high degree of intelligence and adaptability [15][16]. - The model demonstrates superior operational intelligence and generalization capabilities, outperforming advanced models like π0 and π0.5 in multi-step long-sequence tasks and fine manipulation operations [11][16]. Group 2: Training Phases - The training process consists of three stages: 1. **Embodied VLM**: Pre-training on selected visual-language data to enhance spatial understanding and trajectory inference [21]. 2. **Cross-domain Joint Training**: Merging data from various robots and perspectives to improve instruction following and spatial reasoning [24]. 3. **Real-World Reasoning-Action Training**: Utilizing the Astribot S1 robot's trajectories for training, allowing the model to learn executable actions in real-world scenarios [30][42]. Group 3: Technical Innovations - Lumo-1 employs a Spatial Action Tokenizer (SAT) for action space modeling, converting action trajectories into reusable "action vocabulary" for more efficient execution [33]. - The model incorporates structured reasoning, allowing it to understand and execute tasks based on abstract concepts and contextual cues, enhancing its decision-making capabilities [35][41]. Group 4: Performance and Impact - Lumo-1 has shown remarkable generalization abilities in real-world environments, successfully adapting to various scenarios and accurately identifying objects despite changes in presentation [44]. - The model's performance in seven multimodal benchmark tests indicates a significant improvement over baseline models, demonstrating that reasoning and action capabilities can coexist without compromising each other [44].