快慢双系统融合

Search documents
推理与操控能力双提升!具身机器人双系统VLA模型新突破
量子位· 2025-07-10 03:19
Core Viewpoint - The article discusses the innovative Fast-in-Slow (FiS-VLA) model, which integrates fast and slow systems in robotic control, enhancing both execution speed and reasoning capabilities [1][7][29]. Group 1: Model Innovation - FiS-VLA represents the first unified dual-system VLA model that allows for collaborative slow reasoning and fast execution within a single pre-trained model, overcoming the limitations of traditional separate systems [2][8]. - The model achieves a success rate of 68% and 74% on real-world tasks with AgileX and AlphaBot platforms, respectively, surpassing the Pi0 model by over 10 percentage points [2][10]. Group 2: System Design - The model employs a dual-system architecture inspired by Daniel Kahneman's fast-slow brain theory, where System 2 handles high-level reasoning and System 1 executes actions in real-time [6][12]. - FiS-VLA utilizes heterogeneous input and asynchronous frequency strategies, allowing for rapid responses while maintaining precise control [7][13]. Group 3: Training Methodology - The training strategy involves a dual-aware co-training approach, where System 1 learns action generation and System 2 retains contextual reasoning capabilities, preventing catastrophic forgetting [20][22]. - The model is pre-trained on over 860,000 robot task trajectories, utilizing a 7 billion parameter LLaMA2 language model and visual encoders for semantic and spatial representation [22][23]. Group 4: Performance Metrics - In RLBench simulation tasks, FiS-VLA achieved a 69% average success rate, outperforming competitors like CogACT (61%) and Pi0 (55%) [23]. - The model's control frequency reached 21.9 Hz, more than double that of CogACT and significantly faster than Pi0 [23][24]. Group 5: Generalization Capability - FiS-VLA demonstrates robust performance in generalization tasks, maintaining over 50% success rates under varying conditions, unlike other models that experience significant performance drops [4][27]. - The integration of fast and slow systems enhances the model's ability to understand semantics and react quickly, contributing to its strong generalization and robustness [28][29].