端侧模型
Search documents
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35
Core Insights - MiniCPM4, developed by Tsinghua University and Weizhi Intelligent Team, is an open-source model that achieves optimal performance with only 22% of the training cost compared to similar models, offering 8B and 0.5B parameter sizes [1][3][4] - The model utilizes a novel sparse attention mechanism, InfLLM v2, which allows for efficient long-context processing, achieving a 5% sparsity rate [2][8][16] - MiniCPM4 demonstrates superior performance in benchmark tests, outperforming models like Qwen-3 and Gemma-3 while using significantly less training data [3][11][116] Model Performance - MiniCPM4-8B matches the performance of Qwen-3-8B and surpasses Gemma-3-12B with only 22% of the training data used by Qwen-3 [3][116] - MiniCPM4-0.5B outperforms Qwen-3-0.6B and Llama 3.2 in various benchmark tests, showcasing its efficiency in smaller parameter sizes [3][11] - The model achieves a decoding speed of 600 tokens per second with minimal performance loss during quantization [3][10] Technical Innovations - The InfLLM v2 architecture allows for efficient long-context processing by dynamically selecting relevant context tokens, reducing computational costs by 60% compared to previous methods [8][11][16] - The model incorporates a lightweight CUDA inference framework (CPM.cu) and a cross-platform deployment framework (ArkInfer) to optimize performance on edge devices [19][20][40] - The FR-Spec algorithm enhances speculative sampling efficiency, reducing computational overhead by 75% while maintaining output accuracy [28][30] Data Efficiency - MiniCPM4 achieves high capability density by utilizing only 8 trillion tokens for training, compared to 36 trillion tokens used by Qwen-3, demonstrating effective data filtering strategies [56][116] - The UltraClean data selection method enhances the quality of pre-training data, significantly improving model performance [57][61] Application and Use Cases - MiniCPM4 is designed for long document understanding and generation, proving effective in tasks such as automated literature review generation and complex tool interactions [120][130] - The model's ability to handle long sequences and maintain high accuracy in context extrapolation makes it suitable for various applications in AI-driven tasks [118][119]
开启端侧长文本时代!面壁全新架构,让小钢炮最快提升220倍
机器之心· 2025-06-09 08:03
Core Viewpoint - The article discusses the significant advancements in edge language models, particularly highlighting the launch of MiniCPM 4.0 by the AI startup Mianbi Intelligent, which represents a transformative innovation in the field of AI [2][3]. Group 1: Model Performance and Innovations - MiniCPM 4.0 features the industry's first system-level context-sparse language model innovation, achieving a high sparsity of 5%, enabling long-text reasoning on edge devices [4][5]. - The model comes in two versions: 8B and 0.5B parameters, both of which set new performance benchmarks for edge models [5]. - MiniCPM 4.0-8B demonstrates a stable 5x speed increase in long-text reasoning compared to similar models, with a maximum acceleration of 220x in extreme scenarios [5][10]. - In 128K long-text scenarios, MiniCPM 4.0-8B requires only 1/4 of the cache storage space compared to Qwen3-8B [16]. Group 2: Technical Architecture and Efficiency - The model employs an efficient dual-frequency shifting mechanism that allows it to automatically switch attention modes based on task characteristics, optimizing performance for both long and short texts [13]. - MiniCPM 4.0 integrates a self-developed inference framework, CPM.cu, which combines sparsity, speculation, and quantization for efficient edge inference, achieving a 5x speed increase [31]. - The BitCPM quantization algorithm achieves state-of-the-art 4-bit quantization, maintaining excellent performance even after a 90% reduction in model size [32]. Group 3: Market Implications and Future Directions - The advancements in MiniCPM 4.0 are expected to lead to a wave of updates in AI edge models integrated into smartphones and automotive systems, indicating a potential overhaul of many applications [19]. - Mianbi Intelligent emphasizes its focus on application-oriented advantages, having adapted the model for major chip platforms like Intel, Qualcomm, and Huawei Ascend [18]. - The company plans to continue releasing more foundational models in the MiniCPM series and explore multimodal models, indicating a commitment to ongoing innovation in AI capabilities [51].
国泰海通|电子:Deepseek R1更新,商业场景拓展加速
国泰海通证券研究· 2025-06-02 12:31
Core Viewpoint - The update of Deepseek R1 enhances its deep thinking capabilities, positioning it alongside top international models like OpenAI-o3 and Gemini-2.5-Pro-0506, which is expected to accelerate the growth of domestic computing power demand and the implementation of edge models [1][2]. Summary by Sections Performance Improvement - Deepseek R1-0528 has achieved performance iteration through improved training methods, showing significant enhancements in deep thinking capabilities across various benchmarks, closely matching the performance of leading international models [3]. - The distilled model, Deepseek-R1-0528-Qwen3-8B, demonstrates strong performance in mathematical testing, ranking just below Deepseek-R1-0528 and comparable to Qwen3-235B [3]. - The updated model has reduced hallucination rates by approximately 45-50% in tasks such as rewriting and reading comprehension, while also optimizing for different writing styles, enabling the generation of more structured long-form content [3]. Commercialization Potential - The performance improvements in deep thinking and writing, along with reduced hallucination rates, are expected to enhance user experience, potentially increasing user penetration and daily usage frequency, thereby driving growth in the domestic computing power industry [4]. - The outstanding performance of the distilled training model is anticipated to accelerate the deployment of large models on edge devices such as smartphones, PCs, and smart glasses, improving the intelligence level of these devices and enabling AI empowerment [4]. Catalyst - The iterative upgrade of Deepseek's large model performance serves as a catalyst for further advancements in the field [5].
「AI新世代」茅台基金参投!面壁智能完成新一轮数亿元融资,大模型“吸金”几家欢喜几家愁
Hua Xia Shi Bao· 2025-05-22 14:46
Group 1 - The core viewpoint of the articles highlights a significant shift in investment logic within the AI industry, moving from investing in models to prioritizing application-focused investments [1][7][9] - The "AI Six Tigers" have largely fallen silent in terms of financing, with only a few companies like Zhipu and Mianbi Intelligence successfully securing funding [1][5] - Mianbi Intelligence has raised substantial funding, including a recent multi-billion yuan round led by various investors, indicating strong market interest in application-oriented AI solutions [2][5] Group 2 - Mianbi Intelligence focuses on edge models rather than general-purpose foundational models, having released several iterations of its flagship product, MiniCPM [3][5] - The company has strategically positioned itself in various sectors, particularly in the automotive industry, by forming partnerships with major tech firms like Intel [5][6] - Investment in AI applications has shown new characteristics, with a stable number of financing cases but smaller individual investment amounts compared to previous years [7][8]