Workflow
混合专家模型(MoE)
icon
Search documents
从GPT-2到gpt-oss,深度详解OpenAI开放模型的进化之路
机器之心· 2025-08-18 05:15
Core Insights - OpenAI has released its first open-weight models, gpt-oss-120b and gpt-oss-20b, since the launch of GPT-2 in 2019, which can run locally due to optimizations [4][5] - The article provides a detailed analysis of the architectural advancements from GPT-2 to gpt-oss and compares it with Qwen3 [4][5] Model Architecture Overview - gpt-oss-20b can run on consumer-grade GPUs with 16 GB RAM, while gpt-oss-120b requires a single H100 processor with 80 GB RAM or more [10] - The architecture of gpt-oss models appears conventional, as leading LLM developers often use similar foundational architectures with minor adjustments [10][11] Changes Since GPT-2 - The article highlights significant changes from GPT-2, including the removal of Dropout, the adoption of RoPE for positional encoding, and the replacement of GELU with Swish/SwiGLU [20][22][29] - The introduction of Mixture of Experts (MoE) models allows for increased parameter capacity while maintaining efficiency by activating only a subset of experts for each token [39] - Grouped Query Attention (GQA) is introduced as a more efficient alternative to Multi-Head Attention (MHA) [41] - Sliding window attention is applied in gpt-oss to reduce memory usage and computational costs [47] - RMSNorm replaces LayerNorm for better efficiency in large-scale LLMs [52] Comparison with Qwen3 - gpt-oss-20b has a wider architecture with more attention heads, while Qwen3 has a deeper architecture with more transformer modules [69][70] - gpt-oss uses fewer but larger experts compared to Qwen3, which has more smaller experts [72] - Both models utilize grouped query attention, but gpt-oss incorporates sliding window attention to limit context size [82] Additional Insights - gpt-oss models are designed for inference, allowing users to control inference workload easily [93] - The training compute for gpt-oss is estimated at 2.1 million H100 GPU hours, comparable to other large models [92] - The MXFP4 optimization allows gpt-oss models to run on a single GPU, enhancing accessibility [98] - Benchmark results indicate that gpt-oss performs comparably to proprietary models, although it has not yet been extensively tested [101][106]
DeepSeek再出手!R1升级版性能大提升,美国对手慌了?
Jin Shi Shu Ju· 2025-05-30 03:52
Core Insights - DeepSeek's R1 model has undergone a minor version upgrade, enhancing semantic understanding, complex logical reasoning, and long text processing stability [1] - The upgraded model shows significant improvements in understanding capabilities and programming skills, capable of generating over 1000 lines of error-free code [1] - The R1 model's cost-effectiveness is highlighted, being priced at 1/11 of Claude-3.7-Sonnet and 1/277 of GPT-4.5, while being open-source for commercial use [1] Group 1 - The R1 model has gained global attention since its January release, outperforming Western competitors and causing a drop in tech stocks [2] - Following the release of the V3 model, interest in DeepSeek has shifted towards the anticipated R2 model, which is expected to utilize a mixture of experts model with 1.2 trillion parameters [2] - The latest version R1-0528 has sparked renewed media interest, showcasing competitive performance against OpenAI's models in code generation [2] Group 2 - DeepSeek's low-cost, high-performance R1 model has positively influenced the Chinese tech stock market and reflects optimistic market expectations regarding China's AI capabilities [2] - The upgrade has also shown improvements in reducing hallucinations, indicating that DeepSeek is not only catching up but competing with top models [1]
中金 • 联合研究 | AI十年展望(二十三):AI+陪伴:技术降本×场景升维,提供深度情绪价值
中金点睛· 2025-05-29 23:39
Core Viewpoint - AI companionship applications are rapidly emerging and gaining popularity, with significant market potential and user demand, particularly among younger demographics [2][7][8]. Group 1: Market Overview - The global AI companionship market is projected to reach approximately $30 million in 2023, with potential growth to $70 billion and $150 billion by 2030 under baseline and optimistic scenarios, respectively, reflecting a CAGR of 200% and 236% from 2024 to 2030 [7]. - Monthly active users (MAU) of AI companionship products have increased nearly 30 times from under 500,000 to about 15 million between 2018 and 2023, outpacing the growth rates of social media and online gaming [7][8]. Group 2: User Demographics and Needs - The primary user base for AI companionship applications consists of younger individuals seeking emotional support, entertainment, and efficiency improvements [2][8]. - Users exhibit a higher tolerance for AI imperfections in companionship scenarios compared to productivity applications, where accuracy is paramount [8]. Group 3: Technological Innovations - The use of mixed expert models (MoE) has significantly reduced costs and improved efficiency in AI dialogue scenarios, enabling better user experiences [16][18]. - Advances in long-text capabilities and linear attention mechanisms are expected to enhance user interactions by allowing for more coherent and contextually relevant conversations [23][24]. - Multi-modal capabilities, including image, audio, and video generation, are becoming essential for enriching user experiences and increasing engagement [27][30]. Group 4: Application Landscape - Notable AI companionship applications include Replika, Character.AI, MiniMax's Talkie, and others, each focusing on different aspects such as emotional support, interactive content, and user-generated content [3][41][44]. - Character.AI has emerged as a leader in the market, achieving a peak MAU of 22 million by August 2024, driven by its strong technical foundation and user engagement strategies [36][37]. Group 5: Future Directions - The industry is expected to explore hardware integration to enhance user experiences, particularly in educational and gaming contexts, targeting broader demographics including children and the elderly [64][65]. - The potential for AI companionship applications to evolve into comprehensive content platforms, akin to TikTok or Xiaohongshu, is being discussed, with a focus on user engagement and emotional connections [59][60].
DeepSeek R1模型完成“小版本试升级”,编程、逻辑理解上了一个层次!
华尔街见闻· 2025-05-29 00:57
Core Viewpoint - DeepSeek has released an updated version of its R1 model, enhancing its capabilities in semantic understanding, complex logical reasoning, and long text processing stability, amidst escalating competition in the AI sector [1][2]. Group 1: Model Enhancements - The R1 model has significantly improved its understanding capabilities, with user feedback indicating a notable increase in performance, particularly in activating parameters and presenting key information logically [3]. - Programming capabilities have also seen a substantial upgrade, with users reporting the ability to generate over 1000 lines of code without bugs [4]. - The R1 model is now considered competitive with Claude 4, a leading programming model [5]. Group 2: Previous Model Performance - Earlier this year, DeepSeek released the DeepSeek-V3-0324 model, which outperformed Claude-3.7-Sonnet in various assessments, particularly in mathematics and coding tasks, and was noted for its strong performance in reasoning tasks despite being a non-reasoning model [6]. - The cost-effectiveness of the R1 model is highlighted, being priced at only 1/11 of Claude-3.7-Sonnet and 1/277 of GPT-4.5, while also being open-source and free for commercial use [7]. Group 3: Market Impact - The emergence of the R1 model has led to a decline in global tech stocks, as investors question the necessity of significant investments by companies like Microsoft in developing advanced AI models and services [8]. Group 4: Future Developments - There is ongoing speculation regarding the release of the R2 model, which is expected to enhance code generation capabilities and reasoning in multiple languages. Initial plans for its release were set for early May [9]. - The R2 model is anticipated to utilize a more advanced mixture of experts model, with a total parameter count projected to reach 1.2 trillion, significantly reducing reasoning costs compared to GPT-4 [10]. - Despite the speculation, DeepSeek has not officially confirmed any details regarding the R2 model's release timeline [11].
华为盘古首次露出,昇腾原生72B MoE架构,SuperCLUE千亿内模型并列国内第一
华尔街见闻· 2025-05-29 00:57
Core Insights - The emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team addresses the inefficiencies of traditional Mixture of Experts (MoE) models, ensuring balanced computational load across devices while maintaining high performance [1][7][27] - The Pangu Pro MoE model, with 72 billion total parameters and 16 billion active parameters, achieves competitive performance in the industry, ranking first among models with less than 100 billion parameters in China [2][22] Group 1: Model Architecture and Efficiency - The MoGE architecture introduces a grouping mechanism that ensures balanced expert activation, significantly improving computational efficiency and reducing system bottlenecks [1][6][12] - The model demonstrates superior throughput, achieving 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, outperforming similar-sized dense models [18][26] Group 2: Performance Metrics - In the latest SuperCLUE ranking, Pangu Pro MoE scored 58.75, showcasing its strong capabilities in various reasoning tasks and outperforming other models in complex reasoning scenarios [3][22] - The model exhibits excellent performance across multiple benchmarks, including English and Chinese language tasks, demonstrating its versatility and adaptability in complex cognitive tasks [22][23][24] Group 3: Industry Impact - The introduction of Pangu Pro MoE signifies a shift in the AI industry from a focus on parameter quantity to practical application, enabling efficient cloud inference and supporting high-concurrency real-time scenarios [27] - Huawei's innovations in the MoE architecture redefine the value of large models, providing a robust foundation for AI applications across various industries [27]
华为盘古首次露出,昇腾原生72B MoE架构,SuperCLUE千亿内模型并列国内第一
机器之心· 2025-05-28 08:09
Core Insights - The article discusses the emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team, which addresses the inefficiencies of traditional Mixture of Experts (MoE) models by ensuring balanced computational load across devices [2][6][31] - Pangu Pro MoE, built on the MoGE architecture, has demonstrated superior performance in industry benchmarks, achieving a score of 59 on the SuperCLUE leaderboard with only 72 billion parameters, making it competitive against larger models [3][26] Technical Innovations - The MoGE model introduces a grouping mechanism during the expert selection phase, which ensures that each token activates an equal number of experts within predefined groups, thus achieving load balancing across devices [2][12] - The architecture utilizes a batch-level auxiliary loss function to maintain balanced expert activation, enhancing overall model efficiency [16][18] Performance Metrics - Pangu Pro MoE achieves a throughput of 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, significantly outperforming other models of similar scale [24] - The model exhibits a nearly uniform expert load distribution, with each expert handling approximately 12.5% of the total token volume, indicating efficient resource utilization [29] Industry Impact - The introduction of Pangu Pro MoE signifies a shift from a "parameter arms race" to a focus on practical applications, reducing cloud inference costs and supporting high-concurrency real-time scenarios [31] - Huawei's innovations in the AI field aim to redefine the value of large models, providing a robust foundation for enterprises to deploy billion-parameter models effectively [31]
华为+DeepSeek,终于不再“服务器繁忙”?
虎嗅APP· 2025-05-20 14:00
Core Viewpoint - The article discusses the challenges and advancements in the development of large language models, particularly focusing on the MoE (Mixture of Experts) architecture and how Huawei has innovated to enhance its performance and efficiency in this domain [1][4]. Group 1: Challenges of MoE Models - The MoE architecture faces significant challenges, particularly the "cold and hot expert" phenomenon, which leads to uneven load distribution and affects system performance [4][3]. - The uneven load results in increased inference latency and limited throughput due to underutilization of resources [4][3]. Group 2: Huawei's Innovations - Huawei has introduced an efficient load balancing strategy called OmniPlacement, which significantly improves the inference performance of MoE models through expert reallocation, inter-layer redundancy deployment, and near-real-time dynamic scheduling [7][6]. - The OmniPlacement algorithm optimizes the deployment order based on expert activation data, reducing the load imbalance and enhancing system performance [7][6]. Group 3: Key Features of OmniPlacement - The framework supports dynamic priority adjustment and communication domain optimization, which reduces communication overhead compared to traditional static allocation methods [7][9]. - It includes a near-real-time scheduling and dynamic monitoring mechanism that allows for efficient expert allocation and minimizes inference delays [10][9]. Group 4: Experimental Results - Testing on the DeepSeek-V3 model showed that OmniPlacement reduced inference latency by approximately 10% and increased system throughput by about 10%, demonstrating significant improvements in resource utilization [14][14]. - The system maintained stability under dynamic input and high-load conditions, ensuring no performance fluctuations or service interruptions [14][14]. Group 5: Future Directions - Future research will focus on optimizing scheduling algorithms, developing adaptive expert selection mechanisms, and expanding the OmniPlacement framework to support more types of MoE models [15][15]. - The release of OmniPlacement marks a significant advancement in MoE model inference performance and highlights Huawei's competitive edge in AI computing [15][15].
华为发布OmniPlacement技术,实现超大规模MoE专家最优动态部署,提升昇腾推理系统吞吐10%
雷峰网· 2025-05-20 13:01
Core Viewpoint - The article discusses the challenges and advancements in the Mixed Expert Model (MoE) technology, particularly focusing on the load balancing issues and the introduction of the OmniPlacement strategy by Huawei to enhance inference performance [2][4][12]. Group 1: Challenges in MoE Models - The MoE models face significant challenges, particularly the "cold and hot expert" phenomenon, where some experts are frequently called (hot experts) while others are rarely used (cold experts), leading to uneven load distribution [2][4]. - This imbalance results in increased inference latency and limited throughput, as underutilized resources restrict overall system performance [3][14]. Group 2: OmniPlacement Strategy - Huawei's OmniPlacement strategy addresses these challenges through expert reallocation, inter-layer redundancy deployment, and near-real-time dynamic scheduling, significantly improving MoE model inference performance [4][12]. - The strategy includes a joint optimization algorithm that reduces load imbalance by analyzing expert activation data and optimizing deployment order based on call frequency and computational needs [5][14]. Group 3: Key Features of OmniPlacement - OmniPlacement employs inter-layer redundancy deployment to alleviate the pressure on hot experts by allocating additional redundant instances, thus enhancing system throughput [5][12]. - The framework supports dynamic resource allocation based on real-time resource usage and expert call frequency, allowing for predictive resource distribution to minimize performance discrepancies between hot and cold experts [6][9]. Group 4: Testing and Results - Comprehensive testing on the DeepSeek-V3 model demonstrated that OmniPlacement reduces average inference latency by approximately 10% compared to baseline methods, primarily due to dynamic expert allocation and communication domain optimization [12][14]. - The system's throughput improved by about 10%, reflecting a significant increase in resource utilization, especially in high-concurrency scenarios [14]. Group 5: Future Directions - Future research will focus on developing smarter scheduling algorithms and adaptive expert selection mechanisms to further enhance the system's adaptability to complex inputs [15][16]. - The OmniPlacement framework aims to expand its functionality to support more types of MoE models, increasing its versatility and applicability in various industrial settings [16].
华为:让DeepSeek的“专家们”动起来,推理延迟降10%!
量子位· 2025-05-20 05:12
Core Viewpoint - The article discusses Huawei's innovative approach to optimizing the performance of the Mixture of Experts (MoE) model through a technique called OmniPlacement, which addresses the load balancing issues between "hot" and "cold" experts, leading to significant improvements in inference latency and throughput. Group 1: MoE Model and Its Challenges - The MoE model allocates tasks to specialized expert networks, enhancing overall system performance [2] - Load balancing issues arise due to the uneven call frequency of expert networks, leading to performance limitations [3][5] - The disparity in call frequency can exceed an order of magnitude, causing delays in inference time and resource utilization [4][5] Group 2: Huawei's Solution - OmniPlacement - Huawei's OmniPlacement technique aims to optimize the deployment of experts to improve MoE model performance [8] - The approach involves three main steps: joint optimization based on computational balance, inter-layer redundant deployment of high-frequency experts, and near-real-time scheduling with dynamic monitoring [9][14][18] Group 3: Key Features of OmniPlacement - The OmniPlacement algorithm dynamically adjusts expert priorities and node allocations based on real-time statistics, reducing communication overhead [12] - The inter-layer redundant deployment strategy assigns additional instances to frequently called experts, alleviating their load and enhancing system throughput [15] - The near-real-time scheduling mechanism allows for dynamic resource allocation and predictive distribution based on historical data, improving system responsiveness [19][21] Group 4: Performance Improvements - The implementation of OmniPlacement in the DeepSeek-V3 system theoretically reduces inference latency by approximately 10% and increases throughput by about 10% [6][31] - The system demonstrates high adaptability across various MoE model scales and input data distributions, ensuring efficient resource utilization and stable operation [25][26] - The dynamic monitoring mechanism ensures rapid response to sudden load changes, maintaining system stability under high-demand scenarios [32] Group 5: Open Source Initiative - Huawei plans to open-source the OmniPlacement optimization method, promoting wider adoption and collaboration within the AI community [28]
DeepSeek-R1与Grok-3:AI规模扩展的两条技术路线启示
Counterpoint Research· 2025-04-09 13:01
自今年二月起,DeepSeek 便因其开源旗舰级推理模型DeepSeek-R1 而引发全球瞩目——该模型性能 堪比全球前沿推理模型。其独特价值不仅体现在卓越的性能表现,更在于仅使用约2000块NVIDIA H800 GPU 就完成了训练(H800 是H100 的缩减版出口合规替代方案),这一成就堪称效率优化的 典范。 几天后,Elon Musk 旗下xAI 发布了迄今最先进的Grok-3 模型,其性能表现略优于DeepSeek-R1、 OpenAI 的GPT-o1 以及谷歌的Gemini 2。与DeepSeek-R1 不同,Grok-3 属于闭源模型,其训练动用 了惊人的约20万块H100 GPU,依托xAI "巨像"超级计算机完成,标志着计算规模实现了巨大飞跃。 xAI "巨像" 数据中心 Grok-3 展现了无妥协的规模扩张——约200,000块NVIDIA H100 显卡追求前沿性能提升。而 DeepSeek-R1 仅用少量计算资源就实现了相近的性能,这表明创新的架构设计和数据策展能够 与蛮力计算相抗衡。 效率正成为一种趋势性策略,而非限制条件。DeepSeek 的成功重新定义了AI扩展方式的讨 论。我 ...