VLA 推理新范式！一致性模型 CEED-VLA 实现四倍加速！

Core Viewpoint - The article discusses the advancements in Vision-Language-Action (VLA) models, particularly focusing on the CEED-VLA model, which significantly improves inference speed while maintaining high task success rates in robotic applications [2][8][24]. Group 1: VLA Model Overview - VLA models have become a crucial research direction in robotics due to their strong multimodal understanding and generalization capabilities [2]. - Despite advancements, VLA models face significant inference speed bottlenecks, especially in high-frequency and precise tasks [2]. Group 2: Proposed Solutions - The article introduces a consistency distillation training strategy that allows the model to predict multiple correct action tokens simultaneously, enhancing decoding speed [4]. - A mixed-label supervision mechanism is designed to mitigate potential error accumulation during the distillation process [4][9]. - An early-exit decoding strategy is proposed to address inefficiencies in Jacobi decoding, allowing for improved average inference efficiency by relaxing convergence conditions [5][10]. Group 3: Experimental Results - The proposed methods achieved over 4 times inference acceleration across multiple baseline models while maintaining high task success rates in both simulated and real-world robotic tasks [8][18]. - The CEED-VLA model demonstrated a significant increase in manipulation task success rates, exceeding 70%, due to enhanced inference speed and control frequency [24].