华为盘古首次露出，昇腾原生72B MoE架构，SuperCLUE千亿内模型并列国内第一

Core Insights - The article discusses the transition of large models from a "parameter arms race" to "effectiveness" through the introduction of the Pangu Pro MoE model, which utilizes a new approach to improve efficiency and performance in large-scale applications [29]. Group 1: Model Architecture and Innovations - The Pangu Pro MoE model features a total of 72 billion parameters and 16 billion active parameters, achieving a score of 59 on the SuperCLUE benchmark, ranking it among the top in its category [2][3]. - The model employs a Mixture of Grouped Experts (MoGE) architecture, which ensures balanced computational load across devices by grouping experts and activating a consistent number of them for each token [8][14]. - The MoGE architecture addresses the inefficiencies of traditional MoE models, which often suffer from uneven expert activation and can lead to bottlenecks in system efficiency [7][13]. Group 2: Performance Metrics - On the Ascend 300I Duo platform, the Pangu Pro MoE achieves a throughput of 321 tokens per second, while on the Ascend 800I A2 platform, it can reach up to 1528 tokens per second under high concurrency conditions [21][22]. - Compared to other large models, the Pangu Pro MoE demonstrates superior performance in complex reasoning tasks, outperforming models like Qwen3-32B and GLM4-Z1-32B in various benchmarks [25][26]. Group 3: Industry Impact and Applications - The introduction of the Pangu Pro MoE model is expected to lower cloud inference costs and support high-concurrency real-time scenarios, making it suitable for enterprise-level applications [29]. - The model's lightweight inference engine is designed to be compatible with Huawei's Ascend series chips, enabling the deployment of large-scale models in various AI applications [29]. - The advancements in the Pangu Pro MoE model signify a shift in the AI industry towards practical applications, emphasizing efficiency and accessibility for businesses [29].