Workflow
分组混合专家模型(MoGE)
icon
Search documents
华为盘古首次露出,昇腾原生72B MoE架构,SuperCLUE千亿内模型并列国内第一
华尔街见闻· 2025-05-29 00:57
Core Insights - The emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team addresses the inefficiencies of traditional Mixture of Experts (MoE) models, ensuring balanced computational load across devices while maintaining high performance [1][7][27] - The Pangu Pro MoE model, with 72 billion total parameters and 16 billion active parameters, achieves competitive performance in the industry, ranking first among models with less than 100 billion parameters in China [2][22] Group 1: Model Architecture and Efficiency - The MoGE architecture introduces a grouping mechanism that ensures balanced expert activation, significantly improving computational efficiency and reducing system bottlenecks [1][6][12] - The model demonstrates superior throughput, achieving 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, outperforming similar-sized dense models [18][26] Group 2: Performance Metrics - In the latest SuperCLUE ranking, Pangu Pro MoE scored 58.75, showcasing its strong capabilities in various reasoning tasks and outperforming other models in complex reasoning scenarios [3][22] - The model exhibits excellent performance across multiple benchmarks, including English and Chinese language tasks, demonstrating its versatility and adaptability in complex cognitive tasks [22][23][24] Group 3: Industry Impact - The introduction of Pangu Pro MoE signifies a shift in the AI industry from a focus on parameter quantity to practical application, enabling efficient cloud inference and supporting high-concurrency real-time scenarios [27] - Huawei's innovations in the MoE architecture redefine the value of large models, providing a robust foundation for AI applications across various industries [27]