Workflow
推测解码框架
icon
Search documents
Spec-VLA:首个专为VLA推理加速设计的推测解码框架
具身智能之心· 2025-08-02 16:02
Core Viewpoint - The article discusses the development of Spec-VLA, a speculative decoding framework designed to accelerate Vision-Language-Action (VLA) models, addressing challenges related to computational demands and decoding delays [3][4][16]. Research Background and Motivation - VLA models have shown significant progress in generating robot action sequences based on language instructions, but they face challenges such as the large parameter size of backbone Visual Language Models (VLMs) and increased decoding latency due to autoregressive decoding strategies [3]. - Existing acceleration methods have limitations, necessitating a tailored approach for VLA models [3]. Core Framework: Spec-VLA - Spec-VLA introduces a collaborative mechanism between draft and validation models to enhance inference speed, utilizing a draft model to predict action tokens and a validation model to ensure output quality [4][5]. Key Mechanism: Relaxed Acceptance - The relaxed acceptance mechanism allows for a defined threshold of acceptable distance between draft and validation model predictions, facilitating a more efficient decoding process without significant computational overhead [7][10]. Experimental Validation - The framework was evaluated on the LIBERO simulation benchmark across four task sets, demonstrating significant improvements in speed and acceptance length while maintaining success rates [9][10]. - The introduction of relaxed acceptance led to an acceleration factor of 1.22× to 1.42×, with acceptance length increasing by 25%-44% [10][11]. Key Results - The results indicate that as the relaxed threshold increases, the acceptance length significantly improves while maintaining stable success rates across various datasets [10][11]. - Case studies show that relaxed conditions reduce the number of iterations needed to complete action sequences, validating the effectiveness of the relaxed acceptance mechanism [13]. Conclusion and Limitations - Spec-VLA demonstrates the potential of speculative execution in VLA prediction tasks, achieving a speedup of 1.42× and a 44% increase in acceptance length without compromising success rates [16]. - Limitations include the lack of real-world robot scenario testing and the exploration of action chunking strategies [16].