Workflow
面壁小钢炮MiniCPM4.0
icon
Search documents
面壁小钢炮4.0发布:性能比肩 Qwen-3-8B,极限220倍提速
Xin Lang Ke Ji· 2025-06-10 09:37
Core Insights - The fourth generation of the "MiniCPM" model, known as MiniCPM 4.0, has been released, featuring two parameter scales: 8B and 0.5B, achieving the best performance in its class [2][3] - MiniCPM 4.0-8B model utilizes a sparse attention mechanism, demonstrating performance comparable to Qwen-3-8B while requiring only 22% of the training cost [2][4] - The model achieves a remarkable inference speed of 600 Token/s, with a 220x acceleration in extreme scenarios, significantly enhancing long text processing capabilities [2][3] Performance and Architecture - MiniCPM 4.0 offers a 5x acceleration in long text inference speed compared to similar models like Qwen-3-8B and Llama-3-8B, with a maximum acceleration of 220x under memory-constrained conditions [3][4] - The model's architecture, InfLLMv2, reduces the sparsity from the industry standard of 40%-50% to just 5%, allowing for efficient long text calculations with only 1/10 of the computational load [4] - In terms of memory usage, MiniCPM 4.0-8B requires only 1/4 of the cache storage space compared to Qwen3-8B for 128K long text scenarios, indicating significant model compression and efficiency [4] Applications and Market Impact - Based on the 8B version, the company has fine-tuned two specific capability models for use as MCP Client and a research tool, MiniCPM4-Surve, which competes with Deep Research [5] - The MiniCPM series has achieved over 10 million downloads across all platforms, indicating strong market interest and adoption [5]
面壁小钢炮4.0原生稀疏模型发布:最高220倍提速,开启端侧长文本时代
IPO早知道· 2025-06-10 02:39
首个系统级上下文稀疏化高效创新模型。 本文为IPO早知道原创 作者| Stone Jin 微信公众号|ipozaozhidao 据 IPO早知道消息,面壁智能于日前举行的 2025智源大会 上发布了 新一代「面壁小钢炮」 MiniCPM4.0端侧模型发布 。 一款 8B稀疏闪电版,带来端侧性能创新式大跃升;一款0.5B实力演 绎以小博大,适配广泛终端场景。 值得一提的是, 第四代小钢炮推出了首个原生稀疏模型, 5%的极高稀疏度加持系统级创新技术的 大爆发,让长文本、深思考在端侧真正跑起来,宣告了端侧长文本时代到来 ; 220倍极限加速,一 半参数翻倍性能的表现, 则继续 带来端侧基模最极致表现。 具体来讲, 面对此前端侧模型长文本「龟速推理」业界难题, MiniCPM 4-8B「闪电稀疏版」, 采用了新一代上下文稀疏高效架构 ,相较于同等参数规模端侧模型实现了长文本推理速度 5倍常规 加速以及最高220倍加速(显存受限极限场景下测出),真正让端侧模型长文本推理实现了「快如闪 电」的质变。此外,注意力机制上实现了高效双频换挡,长文本用稀疏,短文本用稠密,切换快如 流。 同时, MiniCPM 4.0推出端侧性能 ...