华为，重大突破！

Core Viewpoint - Huawei's launch of the Pangu Ultra MoE model, with a parameter scale of 718 billion, signifies a major advancement in China's AI industry, showcasing the capability for independent and controllable training practices in domestic AI infrastructure [1][5]. Group 1: Breakthroughs in Domestic Computing Power and Models - Huawei has achieved a significant breakthrough in training ultra-large-scale and highly sparse MoE models, overcoming challenges related to stability during the training process [3]. - The Pangu team introduced innovative designs such as Depth-Scaled Sandwich-Norm (DSSN) architecture and TinyInit initialization methods, enabling long-term stable training with over 18TB of data on the Ascend platform [3]. - The introduction of the EP loss load optimization method ensures better load balancing among experts and enhances their specialization capabilities [3]. Group 2: Training Method Innovations - Huawei's team has disclosed key technologies that efficiently connect large sparse MoE reinforcement learning (RL) post-training frameworks on the Ascend CloudMatrix 384 super nodes, marking a transition to super node cluster training [4]. - Recent upgrades to pre-training systems have improved the performance of the Ascend platform, increasing the MFU from 30% to 41% [4]. - The Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, demonstrates exceptional performance that rivals larger models through innovative dynamic activation of expert networks [4]. Group 3: Industry Implications - The developments by Huawei validate the capability to efficiently and stably train ultra-large-scale sparse models on domestic AI computing platforms, achieving a closed loop of "full-stack domestication" and "fully controllable processes" [5]. - The advancements in AI models and infrastructure are expected to bolster the growth of China's AI industry, providing a strong foundation for future innovations [1][5].