Workflow
端侧模型
icon
Search documents
面壁李大海谈端侧模型竞争:元年开启,巨头涌入印证前景无限可能
Huan Qiu Wang· 2025-08-15 07:48
Core Insights - The CEO of Mianbi Intelligent, Li Dahai, announced that 2025 will mark the "Year of Edge Intelligence," indicating a significant opportunity in the market as it is still in its formative stages [1] - The industry consensus is shifting towards the advantages of edge models and "edge-cloud collaboration," with major players increasingly focusing on edge technology [1] - Mianbi Intelligent aims to establish commercial advantages quickly while maintaining a balance between technology and user value, emphasizing the need for differentiated user experiences that cloud models cannot replicate [1] Company Strategy - Mianbi Intelligent's core competitive advantage lies in efficiency, striving for the best performance with minimal resources, which leads to faster and more cost-effective edge model solutions [1] - The company introduced the MiniCPM edge model in early 2024, which has 2.4 billion parameters, surpassing the Mistral 7B model, and has achieved over 13 million downloads [2] - The MiniCPM model has been successfully integrated with major chip manufacturers like Qualcomm, NVIDIA, MTK, Intel, Huawei, and Rockchip, and is particularly noted for its application in smart automotive human-machine interaction [2] Market Dynamics - The influx of new entrants into the market is seen as validation of Mianbi Intelligent's strategic choices and the potential for accelerated market growth [1] - The company has established a dedicated automotive business line to promote the widespread adoption of the MiniCPM model in vehicles [2]
面壁智能成立汽车业务线,首款MiniCPM车型月底上市
Mei Ri Jing Ji Xin Wen· 2025-08-15 07:45
Core Viewpoint - The company, Mianbi Intelligent, has undergone an organizational upgrade to establish a dedicated automotive business line, indicating a strategic focus on the automotive sector [1] Group 1: Organizational Changes - In late July, Mianbi Intelligent initiated a new round of organizational upgrades, creating a primary organization specifically for the automotive business line [1] - The CEO, Li Dahai, communicated these changes through a company-wide letter [1] Group 2: Partnerships and Collaborations - Mianbi Intelligent has formed partnerships with major automotive manufacturers including Geely, Volkswagen, Changan, Great Wall, and GAC [1] Group 3: Product Launch - The first mass-produced vehicle equipped with Mianbi's MiniCPM edge model, the Changan Mazda strategic new energy vehicle MAZDA EZ-60, is expected to be launched by the end of this month [1]
Qwen紧追OpenAI开源4B端侧大模型,AIME25得分超越Claude 4 Opus
量子位· 2025-08-07 00:56
Core Insights - The Qwen team has released two new models, Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, which are designed to enhance performance on various tasks, particularly in reasoning and general capabilities [2][3][5]. Model Performance - Qwen3-4B-Thinking-2507 achieved a score of 81.3 in the AIME25 assessment, outperforming competitors like Gemini 2.5 Pro and Claude 4 Opus [4][5][23]. - The new models support a context length of 256k, significantly improving context awareness and understanding [3][17]. Model Specifications - Qwen3-4B-Instruct-2507 is a non-reasoning model that enhances general capabilities and multi-language support, while Qwen3-4B-Thinking-2507 is a reasoning model tailored for expert-level tasks [7][16]. - The 4B parameter size is particularly friendly for edge devices, allowing for deployment on small hardware like Raspberry Pi [2][8][26]. Comparative Analysis - In various tests, Qwen3-4B-Instruct-2507 outperformed smaller closed-source models like GPT-4.1-nano and showed comparable performance to larger models like Qwen3-30B-A3B [13][15]. - The models exhibit significant improvements in areas such as instruction following, logical reasoning, and text generation, with enhanced alignment to user preferences [17][24]. Deployment Recommendations - The Qwen team has provided deployment suggestions for local use, including applications like Ollama and MLX-LM, and recommended using a quantized version for very small devices [27][28]. - For optimal performance, especially in reasoning tasks, it is advised to use a context length greater than 131,072 tokens [29]. Community Engagement - The Qwen team has encouraged community feedback and interaction, with links provided for accessing the new models on platforms like Hugging Face and ModelScope [26][36].
华泰证券|机器人产业跟踪
2025-06-30 01:02
Summary of Key Points from the Conference Call Industry Overview - The conference call primarily discusses the **robotics industry** and **Xpeng Motors**' advancements in this sector, particularly in AI robotics and related technologies [1][2][15]. Core Insights and Arguments - **Xpeng Motors** is rapidly advancing in the robotics field, with self-developed software and leading autonomous driving technology. The company is expected to mass-produce ToB (business-to-business) robots by 2026, utilizing innovative hardware technologies such as screw drives, high degrees of freedom in hands, and axial flux motors [1][2]. - The **2025 Shanghai Auto Show** saw a decrease in foot traffic and vehicle models compared to previous years. Domestic brands are significantly iterating in new energy and intelligence, surpassing joint venture brands. Traditional domestic automakers are outperforming new entrants in terms of new model quantity and quality, with a recovery expected in the mid-to-large SUV and MPV markets [1][5]. - There is an increasing market focus on the softer segments of the robotics industry, including operating systems, SoC chips, and large model advancements. Progress has been noted in end-side models based on the DeepSeek open-source model [1][6][7]. - **SoC companies** in the robotics sector reported impressive Q1 2025 results, with revenue and net profit significantly increasing, driven by AI-driven demand for system-level chips. Companies like Rockchip are launching new products and planning next-generation releases, indicating substantial profit elasticity [1][8]. - The **MCU analog chip market** is showing signs of recovery, with increased demand from industrial sectors and potential growth driven by robotics. The domestic market is accelerating the localization replacement cycle, which is expected to enhance traditional demand growth [1][9]. Additional Important Insights - **Tesla** has made significant moves, including the release of new products and a visit to domestic suppliers, indicating a commitment to advancing its localization replacement chain, which could positively impact related companies [1][11]. - The **T-chain industry** is witnessing notable changes, with companies like Rongtai showing advantages in lightweight structural components and micro-screw technology. This sector is becoming clearer as demand for micro-screw products increases [1][12]. - The **demand for humanoid robot screw equipment** is robust, with domestic machine tool companies receiving substantial orders, although supply is currently insufficient to meet demand [1][17]. - There are significant differences in pricing and technology between domestic and international humanoid robot machining equipment, with domestic prices generally lower, leading to a preference for local machines for rapid prototyping [1][18]. - The **production efficiency** of specialized machining methods is improving, with new techniques reducing production time significantly compared to traditional grinding methods [1][19][20]. - The future development trends for humanoid robot screw equipment indicate a strong commitment to improving machining processes, although challenges remain in fully replacing traditional methods [1][21].
长文本推理 5 倍提速!面壁MiniCPM4 端侧模型发布,0.5B模型效果秒杀同级
AI前线· 2025-06-12 06:07
Core Viewpoint - The newly released MiniCPM4.0 model series, featuring 8B and 0.5B parameter scales, significantly enhances edge-side performance and adaptability for various terminal scenarios [1][6]. Model Performance - MiniCPM4.0-8B is the first native sparse model with a 5% sparsity, achieving performance comparable to Qwen-3-8B while using only 22% of the training cost [2][4]. - In benchmark tests like MMLU, CEval, and HumanEval, MiniCPM4.0-0.5B outperforms similar models such as Qwen-3-0.6B and Llama 3.2, achieving a rapid inference speed of 600 Token/s [4][6]. Technological Innovations - The model employs a new context-sparse architecture that allows for a 5x speed increase in long text inference and up to 220x in memory-constrained scenarios [6][8]. - MiniCPM4.0 reduces long text cache requirements to just 1/4 of that needed by Qwen3-8B, achieving a 90% model size reduction while maintaining robust performance [8][10]. Model Architecture - The InfLLMv2 sparse attention architecture allows for efficient "sampling" of relevant text segments, reducing computational costs by 90% compared to traditional models [14][15]. - The model features a dual-frequency switching mechanism that optimizes attention modes for long and short texts, enhancing efficiency and accuracy [17]. Deployment and Adaptation - MiniCPM4.0 has been adapted for major chip platforms including Intel, Qualcomm, and Huawei Ascend, and supports various open-source frameworks [10][24]. - The ArkInfer cross-platform deployment framework addresses the challenges of chip fragmentation, providing a versatile solution for model deployment [25]. Data and Training Innovations - The company utilizes a high-density data selection mechanism to construct high-quality datasets, achieving a 90% reduction in validation costs [28][29]. - The training strategy incorporates advanced techniques like FP8 training and chunk-wise rollout to optimize GPU resource utilization [30].
面壁MiniCPM4端侧模型发布:长文本推理 5 倍提速,0.5B 模型拿下新SOTA
AI科技大本营· 2025-06-10 09:31
Core Viewpoint - The release of MiniCPM4.0 marks a significant advancement in edge-side models, showcasing innovations in performance, speed, and storage efficiency, particularly for long text processing [1][4][32] Group 1: Model Performance and Efficiency - MiniCPM4.0-8B is the first native sparse model with a 5% sparsity, achieving a performance comparable to Qwen-3-8B while using only 22% of the training resources [2][5][6] - MiniCPM4.0-0.5B demonstrates impressive performance with a training cost of just 2.7%, outperforming larger models like Qwen-3-0.6B and Llama 3.2, achieving a speed of 600 Token/s [2][5][9] - The model's architecture allows for a 5x speed increase in long text inference and up to 220x in extreme scenarios, addressing the industry's challenge of slow long text processing [4][9][16] Group 2: Technological Innovations - The introduction of the InfLLM sparse attention architecture significantly reduces computational costs, allowing for efficient long text processing by lowering the sparsity from 40%-50% to 5% [18][19][20] - MiniCPM4.0 employs a three-tiered self-developed inference framework, CPM.cu, which optimizes performance for edge devices, achieving a 5x speed enhancement [21][22] - The model utilizes advanced quantization techniques, including P-GPTQ and BitCPM, to minimize computational and memory demands, ensuring efficient deployment [23][24] Group 3: Data and Training Efficiency - The company emphasizes the importance of high-quality data, utilizing innovative methods to construct datasets, which significantly reduces validation costs by 90% [29][30] - The training strategy incorporates the upgraded Model Wind Tunnel v2, optimizing hyperparameter configurations and enhancing GPU resource utilization [30][32] - MiniCPM4.0's development reflects a commitment to maximizing research investment returns through systematic improvements across data, training, and inference processes [28][32] Group 4: Market Position and Future Directions - MiniCPM4.0 has achieved over 10 million downloads across all platforms, indicating strong market acceptance and recognition [32] - The company plans to continue enhancing model knowledge density and intelligence levels, driving efficient development and large-scale applications in edge-side AI [32]
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35AI Processing
清华大学&面壁智能 投稿 量子位 | 公众号 QbitAI 端侧性价比之王,清华大学和面壁智能团队开源新模型—— MiniCP M 4 ,提供 8B、0.5B 两种参数规模, 仅使用同级别开源模型22%的训练开销 ,就达到了同级别最优性能。 MiniCPM4-8B是 开源首个开源的原生稀疏模型,5%的极高稀疏度加持,让长文本、深思考在端侧真正跑起来。 在MMLU、CEval、MATH500、HumanEval等基准测试中,以仅22%的训练开销,性能比肩 Qwen-3-8B,超越Gemma-3-12B。 MiniCPM4-0.5B 在性能上,也展现出以小博大——在MMLU、CEval、BBH、HumanEval等基准测试中,MiniCPM4.0 -0.5B性能超越同级 的Qwen-3-0.6B、Llama 3.2、Gemma3, 并通过 原生QAT技术 实现几乎不掉点的int4量化以及600Token/s的极速推理速度。 在常见端侧芯片,比如Jetson AGX Orin与RTX 4090上,MiniCPM 4可实现长文本处理的5倍常规加速与极限场景下的百倍加速。 请看VCR: 目前团队已公开发布技术报告,该模 ...
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35
Core Insights - MiniCPM4, developed by Tsinghua University and Weizhi Intelligent Team, is an open-source model that achieves optimal performance with only 22% of the training cost compared to similar models, offering 8B and 0.5B parameter sizes [1][3][4] - The model utilizes a novel sparse attention mechanism, InfLLM v2, which allows for efficient long-context processing, achieving a 5% sparsity rate [2][8][16] - MiniCPM4 demonstrates superior performance in benchmark tests, outperforming models like Qwen-3 and Gemma-3 while using significantly less training data [3][11][116] Model Performance - MiniCPM4-8B matches the performance of Qwen-3-8B and surpasses Gemma-3-12B with only 22% of the training data used by Qwen-3 [3][116] - MiniCPM4-0.5B outperforms Qwen-3-0.6B and Llama 3.2 in various benchmark tests, showcasing its efficiency in smaller parameter sizes [3][11] - The model achieves a decoding speed of 600 tokens per second with minimal performance loss during quantization [3][10] Technical Innovations - The InfLLM v2 architecture allows for efficient long-context processing by dynamically selecting relevant context tokens, reducing computational costs by 60% compared to previous methods [8][11][16] - The model incorporates a lightweight CUDA inference framework (CPM.cu) and a cross-platform deployment framework (ArkInfer) to optimize performance on edge devices [19][20][40] - The FR-Spec algorithm enhances speculative sampling efficiency, reducing computational overhead by 75% while maintaining output accuracy [28][30] Data Efficiency - MiniCPM4 achieves high capability density by utilizing only 8 trillion tokens for training, compared to 36 trillion tokens used by Qwen-3, demonstrating effective data filtering strategies [56][116] - The UltraClean data selection method enhances the quality of pre-training data, significantly improving model performance [57][61] Application and Use Cases - MiniCPM4 is designed for long document understanding and generation, proving effective in tasks such as automated literature review generation and complex tool interactions [120][130] - The model's ability to handle long sequences and maintain high accuracy in context extrapolation makes it suitable for various applications in AI-driven tasks [118][119]
开启端侧长文本时代!面壁全新架构,让小钢炮最快提升220倍
机器之心· 2025-06-09 08:03
Core Viewpoint - The article discusses the significant advancements in edge language models, particularly highlighting the launch of MiniCPM 4.0 by the AI startup Mianbi Intelligent, which represents a transformative innovation in the field of AI [2][3]. Group 1: Model Performance and Innovations - MiniCPM 4.0 features the industry's first system-level context-sparse language model innovation, achieving a high sparsity of 5%, enabling long-text reasoning on edge devices [4][5]. - The model comes in two versions: 8B and 0.5B parameters, both of which set new performance benchmarks for edge models [5]. - MiniCPM 4.0-8B demonstrates a stable 5x speed increase in long-text reasoning compared to similar models, with a maximum acceleration of 220x in extreme scenarios [5][10]. - In 128K long-text scenarios, MiniCPM 4.0-8B requires only 1/4 of the cache storage space compared to Qwen3-8B [16]. Group 2: Technical Architecture and Efficiency - The model employs an efficient dual-frequency shifting mechanism that allows it to automatically switch attention modes based on task characteristics, optimizing performance for both long and short texts [13]. - MiniCPM 4.0 integrates a self-developed inference framework, CPM.cu, which combines sparsity, speculation, and quantization for efficient edge inference, achieving a 5x speed increase [31]. - The BitCPM quantization algorithm achieves state-of-the-art 4-bit quantization, maintaining excellent performance even after a 90% reduction in model size [32]. Group 3: Market Implications and Future Directions - The advancements in MiniCPM 4.0 are expected to lead to a wave of updates in AI edge models integrated into smartphones and automotive systems, indicating a potential overhaul of many applications [19]. - Mianbi Intelligent emphasizes its focus on application-oriented advantages, having adapted the model for major chip platforms like Intel, Qualcomm, and Huawei Ascend [18]. - The company plans to continue releasing more foundational models in the MiniCPM series and explore multimodal models, indicating a commitment to ongoing innovation in AI capabilities [51].
国泰海通|电子:Deepseek R1更新,商业场景拓展加速
Core Viewpoint - The update of Deepseek R1 enhances its deep thinking capabilities, positioning it alongside top international models like OpenAI-o3 and Gemini-2.5-Pro-0506, which is expected to accelerate the growth of domestic computing power demand and the implementation of edge models [1][2]. Summary by Sections Performance Improvement - Deepseek R1-0528 has achieved performance iteration through improved training methods, showing significant enhancements in deep thinking capabilities across various benchmarks, closely matching the performance of leading international models [3]. - The distilled model, Deepseek-R1-0528-Qwen3-8B, demonstrates strong performance in mathematical testing, ranking just below Deepseek-R1-0528 and comparable to Qwen3-235B [3]. - The updated model has reduced hallucination rates by approximately 45-50% in tasks such as rewriting and reading comprehension, while also optimizing for different writing styles, enabling the generation of more structured long-form content [3]. Commercialization Potential - The performance improvements in deep thinking and writing, along with reduced hallucination rates, are expected to enhance user experience, potentially increasing user penetration and daily usage frequency, thereby driving growth in the domestic computing power industry [4]. - The outstanding performance of the distilled training model is anticipated to accelerate the deployment of large models on edge devices such as smartphones, PCs, and smart glasses, improving the intelligence level of these devices and enabling AI empowerment [4]. Catalyst - The iterative upgrade of Deepseek's large model performance serves as a catalyst for further advancements in the field [5].