Workflow
模型量化
icon
Search documents
OpenAI发布ChatGPT世代首个开源模型gpt-oss,4060Ti都能跑得动。
数字生命卡兹克· 2025-08-05 22:08
Core Viewpoint - The article discusses the recent advancements in AI models, particularly focusing on OpenAI's release of the open-source model GPT-oss, which is seen as a significant move in the AI landscape, potentially reshaping the open-source community and lowering barriers for developers [9][80]. Group 1: Model Releases - Google released a new world model, Genie 3, which has generated excitement in the gaming and VR community [3]. - Anthropic announced Claude Opus 4.1, showcasing advancements in programming capabilities [5]. - OpenAI launched GPT-oss, its first open-source model since GPT-2, which includes two models: GPT-oss-120B and GPT-oss-20B [9][14]. Group 2: Model Specifications - GPT-oss-120B has 117 billion parameters with 5.1 billion active parameters per token, while GPT-oss-20B has 21 billion parameters with 3.6 billion active parameters [15][16]. - Both models support a context length of 128K and are designed to be run on consumer-grade hardware, with the 20B model requiring only 16GB of memory [17][20]. Group 3: Performance Metrics - In various benchmarks, GPT-oss-120B and GPT-oss-20B scored 90.0 and 85.3 in MMLU, respectively, indicating strong reasoning and knowledge capabilities [32]. - The models performed well in competitive programming tests, scoring 2622 and 2516 points, respectively, although they were outperformed by OpenAI's previous models [32]. Group 4: Community Impact - The release of GPT-oss is expected to lower the entry barriers for developers and enrich the AI ecosystem, allowing more users to experiment with advanced AI capabilities [80]. - OpenAI's move is seen as a response to competitive pressure from other AI companies, indicating a shift towards more open and accessible AI technologies [78][80].
端侧大模型20250801
2025-08-05 03:18
Summary of Conference Call Records Industry Overview - The discussion primarily revolves around the advancements in **edge AI models** and their comparison with **cloud-based large models**. The focus is on the hardware improvements, particularly in **NPU (Neural Processing Unit)** technology, which enhances the efficiency of edge devices like smartphones and PCs [1][2][3]. Key Points and Arguments 1. **Hardware Advancements**: The improvement in edge AI is significantly driven by advancements in hardware, particularly in chips like Apple's A18 and Qualcomm's Snapdragon 8 Gen 2, which integrate more efficient NPUs alongside traditional CPU and GPU [1][3]. 2. **Model Development**: There is a notable shift towards **multi-modal AI models** that incorporate various functionalities such as programming and mathematical reasoning, indicating a broader application of AI technologies [2][3]. 3. **Performance Metrics**: Current edge AI chips can run models with up to **100 billion parameters**, showcasing their capability to handle complex computations efficiently [3][4]. 4. **Architectural Optimization**: The development of edge models relies heavily on architectural optimizations, such as **Mixture of Experts (MoE)** and **grouped attention mechanisms**, which enhance the model's efficiency and reduce memory consumption [4][5][6]. 5. **Knowledge Density Improvement**: Techniques like **model quantization** are employed to reduce computational load by converting high-precision floating-point numbers into lower-precision formats, allowing for more efficient processing [8][9]. 6. **Dynamic Pruning**: The concept of dynamic pruning is introduced, where parts of the model that do not contribute to performance are removed during training, enhancing flexibility and efficiency [11][12][13]. 7. **Competitive Landscape**: The call highlights the competitive dynamics between domestic and international players in the edge AI space, with companies like **Meta**, **Microsoft**, and **Google** leading in model development, while domestic firms are catching up by focusing on specific application scenarios [14][15][16][17]. 8. **Market Positioning**: Major companies are integrating their edge models into various devices, such as smartphones and PCs, to enhance user experience and drive commercial viability [17][18]. 9. **Domestic Developments**: Domestic companies like **Tencent**, **Alibaba**, and **ByteDance** are developing their edge models, with some achieving competitive performance in niche areas, indicating a growing capability in the local market [22][26][27]. Other Important Insights - The call emphasizes the importance of **data privacy** and the need for edge models to address these concerns while maintaining performance [14]. - The discussion also touches on the **commercialization** of AI technologies, with companies exploring various monetization strategies for their edge AI solutions [17][18]. - The potential for edge AI to surpass human performance in specific tasks is noted, particularly in generating content and automating processes [26][27]. This summary encapsulates the key discussions and insights from the conference call, highlighting the advancements and competitive landscape in the edge AI industry.
面壁MiniCPM4端侧模型发布:长文本推理 5 倍提速,0.5B 模型拿下新SOTA
AI科技大本营· 2025-06-10 09:31
Core Viewpoint - The release of MiniCPM4.0 marks a significant advancement in edge-side models, showcasing innovations in performance, speed, and storage efficiency, particularly for long text processing [1][4][32] Group 1: Model Performance and Efficiency - MiniCPM4.0-8B is the first native sparse model with a 5% sparsity, achieving a performance comparable to Qwen-3-8B while using only 22% of the training resources [2][5][6] - MiniCPM4.0-0.5B demonstrates impressive performance with a training cost of just 2.7%, outperforming larger models like Qwen-3-0.6B and Llama 3.2, achieving a speed of 600 Token/s [2][5][9] - The model's architecture allows for a 5x speed increase in long text inference and up to 220x in extreme scenarios, addressing the industry's challenge of slow long text processing [4][9][16] Group 2: Technological Innovations - The introduction of the InfLLM sparse attention architecture significantly reduces computational costs, allowing for efficient long text processing by lowering the sparsity from 40%-50% to 5% [18][19][20] - MiniCPM4.0 employs a three-tiered self-developed inference framework, CPM.cu, which optimizes performance for edge devices, achieving a 5x speed enhancement [21][22] - The model utilizes advanced quantization techniques, including P-GPTQ and BitCPM, to minimize computational and memory demands, ensuring efficient deployment [23][24] Group 3: Data and Training Efficiency - The company emphasizes the importance of high-quality data, utilizing innovative methods to construct datasets, which significantly reduces validation costs by 90% [29][30] - The training strategy incorporates the upgraded Model Wind Tunnel v2, optimizing hyperparameter configurations and enhancing GPU resource utilization [30][32] - MiniCPM4.0's development reflects a commitment to maximizing research investment returns through systematic improvements across data, training, and inference processes [28][32] Group 4: Market Position and Future Directions - MiniCPM4.0 has achieved over 10 million downloads across all platforms, indicating strong market acceptance and recognition [32] - The company plans to continue enhancing model knowledge density and intelligence levels, driving efficient development and large-scale applications in edge-side AI [32]
0.5B以小搏大拿下端侧模型新SOTA:4090可跑,长文本处理5倍常规加速丨清华&面壁开源
量子位· 2025-06-10 07:35AI Processing
清华大学&面壁智能 投稿 量子位 | 公众号 QbitAI 端侧性价比之王,清华大学和面壁智能团队开源新模型—— MiniCP M 4 ,提供 8B、0.5B 两种参数规模, 仅使用同级别开源模型22%的训练开销 ,就达到了同级别最优性能。 MiniCPM4-8B是 开源首个开源的原生稀疏模型,5%的极高稀疏度加持,让长文本、深思考在端侧真正跑起来。 在MMLU、CEval、MATH500、HumanEval等基准测试中,以仅22%的训练开销,性能比肩 Qwen-3-8B,超越Gemma-3-12B。 MiniCPM4-0.5B 在性能上,也展现出以小博大——在MMLU、CEval、BBH、HumanEval等基准测试中,MiniCPM4.0 -0.5B性能超越同级 的Qwen-3-0.6B、Llama 3.2、Gemma3, 并通过 原生QAT技术 实现几乎不掉点的int4量化以及600Token/s的极速推理速度。 在常见端侧芯片,比如Jetson AGX Orin与RTX 4090上,MiniCPM 4可实现长文本处理的5倍常规加速与极限场景下的百倍加速。 请看VCR: 目前团队已公开发布技术报告,该模 ...
低成本下的高性能模型,是悖论还是可能?
机器之心· 2025-05-31 17:15
Core Viewpoint - The article discusses the paradox of achieving high performance in AI models at low costs, questioning whether the decline in perceived model performance is intentional by AI companies and exploring the implications of cost-saving measures on model quality [2][3]. Group 1: Low-Cost High-Performance Models - The performance and cost dilemma of large language models (LLMs) has been a focal point of public and industry concern, with ongoing discussions about whether top model companies sacrifice precision or service stability to save on inference costs [2][3]. - Following the popularity of ChatGPT, users have expressed dissatisfaction with perceived declines in performance, citing issues such as weakened logic, increased errors, and difficulties in following instructions [2][3]. - The public's concern about companies sacrificing model performance for cost savings is supported by technical and market evidence, particularly highlighted in the controversy surrounding the DeepSeek-R1 model [3][4]. - The true "full version" of DeepSeek-R1 requires significant hardware investment, with initial costs reaching hundreds of thousands of yuan, leading some platforms to potentially use distilled versions that compromise inference capability and stability [3][4]. Group 2: Cost Management Strategies - To balance costs and performance, high-end "full version" models are not widely available, especially in a market flooded with free or low-cost services that often lack sufficient performance [6]. - AI companies are increasingly adopting model distillation or simplified models to reduce inference costs and manage financial investments [6]. - Common strategies to address cost pressures include lowering model precision through techniques such as model quantization, pruning, and knowledge distillation, which have become standard practices in the industry [6].