Workflow
高通骁龙8 Elite(第四代)
icon
Search documents
vivo突破手机AI部署难题,绕开MoE架构限制,骁龙8 Elite流畅运行|ICCV 2025
量子位· 2025-07-03 09:00
Core Viewpoint - The article emphasizes the importance of deploying large models on mobile devices, particularly focusing on maintaining pure language capabilities while integrating multimodal functionalities. Group 1: Challenges in Current MLLM Deployment - Existing mobile large language models (MLLMs) face significant challenges, including a drop of over 10% in pure language task accuracy when supporting multimodal functions [3][4][6]. - Current mobile NPU platforms do not support the Mixture of Experts (MoE) architecture, which is commonly used to maintain language capabilities during multimodal training [7][8]. Group 2: GenieBlue Contributions and Technical Highlights - GenieBlue retains original language capabilities during multimodal training by freezing the original LLM parameters and introducing replicated Transformer layers along with lightweight LoRA modules [3][19]. - Through extensive fine-tuning, GenieBlue achieves multimodal capabilities comparable to mainstream MLLMs while fully preserving original pure language performance [3][19]. - GenieBlue avoids the MoE architecture limitations by employing a non-shared base inference strategy, enabling smooth operation on devices with Qualcomm Snapdragon 8 Elite (4th generation) chips [3][19]. Group 3: Training Data and Model Structure Analysis - The article discusses the limitations of simply adding pure text data to maintain language capabilities, highlighting the challenges in collecting high-quality data and the increased training time [9][12]. - It is noted that adding pure text data has limited impact on multimodal capabilities, and while it helps in objective NLP tasks, it does not significantly aid subjective tasks [11][12]. Group 4: GenieBlue Design and Deployment - GenieBlue's design is based on the CogVLM structure, focusing on separating text and multimodal information processing while avoiding MoE architecture [19][21]. - The deployment strategy involves freezing the original LLM during training and using a non-shared base approach, which effectively maintains the original language model's performance [24][26]. - GenieBlue has been validated for its multimodal and pure language accuracy, demonstrating competitive performance while being efficient for mobile NPU deployment [30][31][35]. Group 5: Performance and Efficiency - GenieBlue's multimodal accuracy is slightly lower than Qwen2.5-VL-3B but retains approximately 97% of BlueLM-V-3B's performance [31]. - In terms of pure language accuracy, GenieBlue shows no decline, contrasting with Qwen2.5-VL-3B, which experiences performance degradation in subjective tasks [33]. - The deployment efficiency of GenieBlue on Snapdragon 8 Elite shows that while there is a slight increase in loading time and memory requirements, it meets the daily usage needs of mobile devices with a speed of 30 tokens per second [35].