百元级硬件流畅运行百亿参数大模型!上交&本智激活开源端侧原生大模型
量子位·2025-07-27 09:01

Core Viewpoint - The next battleground for AI is shifting from the cloud to mobile devices, emphasizing the need for local computation to ensure user privacy and data security [2][3]. Group 1: Industry Trends - Major smartphone manufacturers like Apple, Huawei, Samsung, Xiaomi, and OPPO are integrating large models into mobile devices, indicating a competitive landscape for edge AI [2]. - The challenges of running AI smoothly on local devices are significant, as evidenced by Apple's delayed launch of its core AI features [2][3]. Group 2: Technological Innovations - A new collaboration between Shanghai Jiao Tong University and the startup Zenergize AI has led to the development of the SmallThinker series, which is designed specifically for edge computing [4]. - The SmallThinker models, including SmallThinker-4B-A0.6B and SmallThinker-21B-A3B, are optimized for local CPU inference without relying on high-end GPUs, achieving impressive performance metrics [5][23]. Group 3: Model Architecture - SmallThinker employs a unique architecture that allows for efficient inference on devices with limited computational resources, avoiding the need for traditional model compression techniques [6][8]. - The model features three core technological characteristics: expert knowledge activation, preemptive expert routing to minimize I/O overhead, and a hybrid sparse attention mechanism that reduces memory usage by 76% [9][12][17]. Group 4: Performance Metrics - In extreme memory-constrained scenarios (1GB RAM), the SmallThinker-4B-A0.6B model achieves a speed of 19.91 tokens/s, significantly outperforming competitors like Qwen3-1.7B [26][27]. - On standard PC configurations (8GB RAM), the SmallThinker-21B-A3B model demonstrates a speed of 20.30 tokens/s, doubling the performance of Qwen3-30B-A3B [29]. Group 5: Future Directions - The development team plans to enhance the model's capabilities by scaling up with more high-quality data and aims to create a personal AI assistant that operates entirely on individual devices [32][33]. - The vision is to integrate AI seamlessly into daily life, providing a secure, private, and intelligent experience for users [34].