重磅！华为发布准万亿大模型

Core Insights - Huawei has launched a new model called Pangu Ultra MoE, which has a parameter scale of 718 billion, marking a significant advancement in the MoE model training field on the Ascend AI computing platform [1][3][6] - The release of Pangu Ultra MoE and the Pangu Pro MoE series demonstrates Huawei's capability in achieving a fully controllable training process for domestic computing power and models, validating the innovation capacity of China's AI infrastructure [3][6] Model Architecture and Training Innovations - The Pangu team has introduced innovative designs in model architecture and training methods to address the challenges of training ultra-large-scale and highly sparse MoE models, achieving stable training on the Ascend platform [1][4] - Key innovations include the Depth-Scaled Sandwich-Norm (DSSN) architecture and TinyInit initialization method, which have enabled long-term stable training with over 18TB of data [4] - The introduction of the EP loss load optimization method ensures better load balancing among experts and enhances their specialization capabilities [4] Performance and Efficiency Improvements - The training methods disclosed by Huawei have enabled efficient integration of large sparse MoE reinforcement learning (RL) post-training frameworks on the Ascend CloudMatrix 384 supernodes [5] - Recent upgrades have improved the pre-training system's performance, increasing the multi-factor utilization (MFU) from 30% to 41% [5] - The Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, has demonstrated performance comparable to larger models, ranking first among domestic models under 100 billion parameters in the SuperCLUE leaderboard [5] Industry Implications - The successful training and optimization of ultra-large-scale sparse models on domestic AI platforms signify a closed-loop of "full-stack domestication" and "fully controllable processes" from hardware to software, and from research to engineering [6] - This advancement provides a strong foundation for the development of China's AI industry, reinforcing confidence in domestic AI capabilities [3][6]