NanoVLA
Search documents
边缘设备上高效运行!NanoVLA :保留 VLA 模型的精度与泛化能力,推理速度提升 52 倍
具身智能之心· 2025-11-01 16:03
Core Insights - The article discusses the development of NanoVLA, a new model that addresses the long-standing conflict between "generalization" and "lightweight" in robotic control, achieving a 52-fold increase in inference speed and a 98% reduction in parameter count while maintaining high accuracy [2][19][32]. Group 1: Challenges in Current VLA Models - Current VLA models face a "performance vs. efficiency" dilemma, relying on billions of parameters that hinder deployment on resource-constrained edge devices [3]. - Three major design bottlenecks are identified: redundant modal fusion, rigid action execution, and mismatched model capacity, which collectively lead to high inference delays and inefficiencies [3][6]. Group 2: Innovations of NanoVLA - NanoVLA introduces a novel architecture that decouples visual-language interaction, optimizes action planning, and allocates resources dynamically, allowing for efficient edge deployment without sacrificing accuracy [6][30]. - The model employs a caching mechanism to reuse static features and a long-short action chunking strategy to ensure smooth execution of long tasks while adapting to environmental changes [8][9][13]. Group 3: Experimental Validation - In experiments, NanoVLA outperformed existing models across various benchmarks, achieving a parameter count reduction to 161 million while improving accuracy by 7.6% over a 7.5 billion parameter model [20][21]. - The model demonstrated a 52-fold increase in inference speed compared to OpenVLA, meeting real-time requirements for edge devices [22][29]. Group 4: Real-World Applications - NanoVLA was tested on a low-cost robot, LeRobot, showing high success rates in various tasks, including rigid and flexible object manipulation, with performance exceeding that of larger models [26][27]. - The model's ability to generalize to unseen objects and commands was validated, achieving an 84% success rate in novel scenarios [27]. Group 5: Future Directions - Future enhancements may include multi-modal integration, end-to-end lightweight deployment, and adaptation for multi-robot systems, expanding the model's applicability in diverse robotic environments [31][32].