Workflow
模型量化
icon
Search documents
面壁MiniCPM4端侧模型发布:长文本推理 5 倍提速,0.5B 模型拿下新SOTA
AI科技大本营· 2025-06-10 09:31
Core Viewpoint - The release of MiniCPM4.0 marks a significant advancement in edge-side models, showcasing innovations in performance, speed, and storage efficiency, particularly for long text processing [1][4][32] Group 1: Model Performance and Efficiency - MiniCPM4.0-8B is the first native sparse model with a 5% sparsity, achieving a performance comparable to Qwen-3-8B while using only 22% of the training resources [2][5][6] - MiniCPM4.0-0.5B demonstrates impressive performance with a training cost of just 2.7%, outperforming larger models like Qwen-3-0.6B and Llama 3.2, achieving a speed of 600 Token/s [2][5][9] - The model's architecture allows for a 5x speed increase in long text inference and up to 220x in extreme scenarios, addressing the industry's challenge of slow long text processing [4][9][16] Group 2: Technological Innovations - The introduction of the InfLLM sparse attention architecture significantly reduces computational costs, allowing for efficient long text processing by lowering the sparsity from 40%-50% to 5% [18][19][20] - MiniCPM4.0 employs a three-tiered self-developed inference framework, CPM.cu, which optimizes performance for edge devices, achieving a 5x speed enhancement [21][22] - The model utilizes advanced quantization techniques, including P-GPTQ and BitCPM, to minimize computational and memory demands, ensuring efficient deployment [23][24] Group 3: Data and Training Efficiency - The company emphasizes the importance of high-quality data, utilizing innovative methods to construct datasets, which significantly reduces validation costs by 90% [29][30] - The training strategy incorporates the upgraded Model Wind Tunnel v2, optimizing hyperparameter configurations and enhancing GPU resource utilization [30][32] - MiniCPM4.0's development reflects a commitment to maximizing research investment returns through systematic improvements across data, training, and inference processes [28][32] Group 4: Market Position and Future Directions - MiniCPM4.0 has achieved over 10 million downloads across all platforms, indicating strong market acceptance and recognition [32] - The company plans to continue enhancing model knowledge density and intelligence levels, driving efficient development and large-scale applications in edge-side AI [32]
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35AI Processing
清华大学&面壁智能 投稿 量子位 | 公众号 QbitAI 端侧性价比之王,清华大学和面壁智能团队开源新模型—— MiniCP M 4 ,提供 8B、0.5B 两种参数规模, 仅使用同级别开源模型22%的训练开销 ,就达到了同级别最优性能。 MiniCPM4-8B是 开源首个开源的原生稀疏模型,5%的极高稀疏度加持,让长文本、深思考在端侧真正跑起来。 在MMLU、CEval、MATH500、HumanEval等基准测试中,以仅22%的训练开销,性能比肩 Qwen-3-8B,超越Gemma-3-12B。 MiniCPM4-0.5B 在性能上,也展现出以小博大——在MMLU、CEval、BBH、HumanEval等基准测试中,MiniCPM4.0 -0.5B性能超越同级 的Qwen-3-0.6B、Llama 3.2、Gemma3, 并通过 原生QAT技术 实现几乎不掉点的int4量化以及600Token/s的极速推理速度。 在常见端侧芯片,比如Jetson AGX Orin与RTX 4090上,MiniCPM 4可实现长文本处理的5倍常规加速与极限场景下的百倍加速。 请看VCR: 目前团队已公开发布技术报告,该模 ...
低成本下的高性能模型,是悖论还是可能?
机器之心· 2025-05-31 17:15
Core Viewpoint - The article discusses the paradox of achieving high performance in AI models at low costs, questioning whether the decline in perceived model performance is intentional by AI companies and exploring the implications of cost-saving measures on model quality [2][3]. Group 1: Low-Cost High-Performance Models - The performance and cost dilemma of large language models (LLMs) has been a focal point of public and industry concern, with ongoing discussions about whether top model companies sacrifice precision or service stability to save on inference costs [2][3]. - Following the popularity of ChatGPT, users have expressed dissatisfaction with perceived declines in performance, citing issues such as weakened logic, increased errors, and difficulties in following instructions [2][3]. - The public's concern about companies sacrificing model performance for cost savings is supported by technical and market evidence, particularly highlighted in the controversy surrounding the DeepSeek-R1 model [3][4]. - The true "full version" of DeepSeek-R1 requires significant hardware investment, with initial costs reaching hundreds of thousands of yuan, leading some platforms to potentially use distilled versions that compromise inference capability and stability [3][4]. Group 2: Cost Management Strategies - To balance costs and performance, high-end "full version" models are not widely available, especially in a market flooded with free or low-cost services that often lack sufficient performance [6]. - AI companies are increasingly adopting model distillation or simplified models to reduce inference costs and manage financial investments [6]. - Common strategies to address cost pressures include lowering model precision through techniques such as model quantization, pruning, and knowledge distillation, which have become standard practices in the industry [6].