ActDistill框架
Search documents
ActDistill:同济大学提出动作引导蒸馏框架,机器人推理速度提升1.67倍
具身智能之心· 2025-11-26 00:05
Group 1 - The article discusses the challenges of deploying Vision-Language-Action (VLA) models in real-time or resource-constrained robotic systems due to high computational costs and inference delays [2][3]. - Existing efficient VLA strategies often prioritize visual-language model optimizations, leading to key information loss and incoherent action semantics [2][3]. Group 2 - The proposed ActDistill framework aims to address these issues by providing an action-prediction-oriented distillation framework that balances efficiency and fidelity while preserving action prediction accuracy [3][4]. - ActDistill consists of two core modules: Graph-Structured Encapsulation and Action-Guided Self-Derived Distillation, which work together to model action semantics and guide knowledge distillation [4][8]. Group 3 - The Graph-Structured Encapsulation module explicitly models the hierarchical evolution of action semantics and separates task-related interactions from redundant background signals [6]. - The Action-Guided Self-Derived Distillation module utilizes a lightweight student model that aligns with the teacher model's structure while reducing depth, incorporating dynamic routing to adaptively predict layer gating scores [8][11]. Group 4 - Experimental results show that ActDistill achieves a success rate of 73.95% with a 1.59x speed-up and a 50.5% reduction in computational load compared to full models [9][12]. - The framework demonstrates significant improvements in efficiency and performance across various benchmarks, including LIBERO and SIMPLER [12][13]. Group 5 - The article highlights the importance of the Graph-Structured Encapsulation module, noting that replacing it with a simpler architecture led to a significant drop in performance [13]. - The framework's ability to maintain trajectory stability and focus attention on action-relevant areas is emphasized, showcasing its effectiveness in practical applications [16][17]. Group 6 - ActDistill represents a novel approach to action-centered compression of VLA models, achieving over 50% reduction in computational load while maintaining task success rates [24]. - Future directions include exploring teacher-free or reinforcement learning-guided variants and integrating long-horizon temporal reasoning into the routing mechanism for enhanced adaptability [24].