Workflow
重大突破!刚刚,华为发布!
券商中国·2025-05-30 10:43

Core Viewpoint - Huawei's launch of the Pangu Ultra MoE model, with a parameter scale of 718 billion, signifies a major advancement in China's AI industry, showcasing the capability for independent and controllable training processes on domestic computing platforms [1][4]. Group 1: Breakthroughs in Domestic Computing and Models - The training of ultra-large-scale and highly sparse MoE models is challenging, but Huawei's Pangu team has innovatively designed the model architecture and training methods to achieve stable training on the Ascend platform [2]. - The Pangu team introduced the Depth-Scaled Sandwich-Norm (DSSN) architecture and TinyInit initialization method, enabling long-term stable training with over 18TB of data [2]. - The EP loss optimization method ensures load balancing among experts and enhances their specialization capabilities, while the Pangu Ultra MoE employs advanced MLA and MTP architectures to balance model performance and efficiency [2][3]. Group 2: Training Method Innovations - Huawei's team has disclosed key technologies that enable efficient training of large sparse MoE models on the Ascend CloudMatrix 384 supernodes, marking a transition to a supernode cluster era for reinforcement learning (RL) post-training frameworks [3]. - Recent upgrades to the pre-training system have improved the efficiency of the MFU in large clusters from 30% to 41% [3]. - The Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, demonstrates exceptional performance that rivals larger models through innovative dynamic activation of expert networks [3]. Group 3: Industry Developments - DeepSeek's R1 model has completed a minor version upgrade, outperforming Western competitors in several standardized metrics while maintaining a low cost of only a few million dollars [5]. - Tencent's AI model strategy has been fully unveiled, with the Mix Yuan model achieving a ranking among the top eight globally on the Chatbot Arena platform, showcasing its continuous technological advancements [6].