华为发布准万亿模型Pangu Ultra MoE模型架构和训练细节

Core Insights - Huawei has made significant advancements in the MoE model training field by launching a new model called Pangu Ultra MoE, which has a parameter scale of 718 billion [1] - The model is trained on the Ascend AI computing platform and represents a near-trillion MoE model, showcasing the performance leap in ultra-large-scale MoE training [1] - Huawei has released a technical report detailing the architecture and training methods of the Pangu Ultra MoE model, highlighting innovative designs to address challenges in training stability for ultra-large-scale and highly sparse MoE models [1]