Pangu Ultra MoE

Search documents
通信ETF(515880)涨超5.6%,软硬协同技术革新或成行业新动能
Mei Ri Jing Ji Xin Wen· 2025-08-13 03:17
国泰海通指出,华为正通过软硬协同构建全栈AI竞争力,通信设备行业迎来技术革新。华为AI战 略从对标SOTA模型转向为昇腾硬件量身定制架构,推出Pangu Pro MoE和Pangu Ultra MoE两大创新路 径,分别通过分组专家混合(MoGE)架构和系统级优化解决负载不均衡问题,提升硬件效率。新一代 AI基础设施CloudMatrix采用统一总线(UB)网络,构建分布式高速内存池,降低跨节点通信差异,支 持PDC分离架构和大规模专家并行(LEP)。随着大模型从稠密转向MoE稀疏架构,华为聚焦分布式系 统效率难题,将软硬协同创新拓展至AI系统工程领域,通信设备行业的技术体系正向全栈协同方向深 度演进。 注:如提及个股仅供参考,不代表投资建议。指数/基金短期涨跌幅及历史表现仅供分析参考,不 预示未来表现。市场观点随市场环境变化而变动,不构成任何投资建议或承诺。文中提及指数仅供参 考,不构成任何投资建议,也不构成对基金业绩的预测和保证。如需购买相关基金产品,请选择与风险 等级相匹配的产品。基金有风险,投资需谨慎。 每日经济新闻 (责任编辑:董萍萍 ) 【免责声明】本文仅代表作者本人观点,与和讯网无关。和讯网站对 ...
计算机行业“一周解码”:华为盘古团队推出全新 Pangu Ultra MoE 模型
Bank of China Securities· 2025-06-06 01:17
Investment Rating - The report rates the computer industry as "Outperforming the Market" [32] Core Insights - Nvidia reported strong Q1 earnings with revenue of $44.1 billion, a 12% quarter-over-quarter increase and a 69% year-over-year increase, despite being affected by export controls [11][12] - The DeepSeek R1 model has completed a minor version upgrade, achieving top performance among domestic models and nearing international leaders [13][14] - Huawei's Pangu team launched the Pangu Ultra MoE model, addressing stability issues in training large-scale models, which signifies a successful practice of autonomous training using domestic computing power [15][16] Company Dynamics - Zhongke Chuangda announced a special loan commitment of up to 70 million yuan for stock repurchase [3] - Kingsoft Office disclosed the results of its restricted stock incentive plan, with a total of 505,289 shares newly added, bringing the total share capital to 463,179,293 shares [23] - The report highlights the importance of companies in the Huawei supply chain and EDA software sector, suggesting a focus on firms like Softcom Power, Tuo Wei Information, and others [4]
昇腾+鲲鹏双核暴击!华为打通MoE训练任督二脉再加速20%,内存省70%
雷峰网· 2025-06-04 09:31
令人惊喜的是,结果显示, MOE 训练在之前的基础上,吞吐又提升了 20% ,内存占用降低了 70% 。 这不仅是一次技术突破,更是引领 MoE 训练的风向标。 " Pangu Ultra MoE 的每一项突破,都体现了华为在AI底层技术 与工程化落地中的领先实力。 " 作者丨李希 最近,华为在 MoE 训练系统方面,给出了 MoE 训练算子和内存优化新方案:三大核心算子全面提速, 系统吞吐再提 20% , Selective R/S 实现内存节省 70% 。 在通往更强大的 AI 路上, MoE 已成为科技巨头另一个首选路径。 只要 Scaling Law 没有失效,大模型的参数规模依旧不断扩大,由此 AI 智能水平才能不断攀升。 凭借独特的架构设计, MoE 正以前所未有的参数规模,成为突破大规模模型训练的算力瓶颈的关键路径 之一。 然而,如何将 MoE 潜力真正转化为高效的训练实践,一直是业界探索的难题。 此前,华为曾通过 Adaptive Pipe&EDPB 框架,实现了集群级高效分布式计算,让通信和计算能完美并 行,提高训练集群效率。 本次,华为通过昇腾与鲲鹏算力的深度协同,进一步实现了训练算子计算 ...
不用GPU,大模型每2秒吃透一道高数大题!这就是华为的实力
雷峰网· 2025-05-30 09:48
Core Viewpoint - Huawei defines the benchmark for domestic large model training through technological innovation, achieving breakthroughs in computing power utilization and post-training throughput [1][4]. Group 1: Technological Innovations - Huawei's "Ascend + Pangu Ultra MoE" combination has unlocked a fully controllable training loop for domestic computing power and models, achieving industry-leading performance in cluster training systems [4][5]. - The pre-training phase saw the Ascend Atlas 800T A2 cluster's model training utilization (MFU) increase to 41%, while the post-training phase achieved a throughput of 35K Tokens/s on a single CloudMatrix 384 super node [5][36]. - Huawei disclosed key technologies in its technical report, highlighting the efficient integration of sparse MoE reinforcement learning post-training frameworks [6][7]. Group 2: Challenges in Current Training Processes - Six main challenges were identified in the current MoE pre-training and reinforcement learning post-training processes, including difficulties in parallel strategy configuration, communication bottlenecks, uneven system load distribution, excessive operator scheduling overhead, complex training process management, and limitations in large-scale expansion [10][11]. Group 3: Solutions to Enhance Training Efficiency - Huawei proposed a complete end-to-end solution to address these challenges, focusing on enhancing training cluster utilization through intelligent parallel strategy selection, deep integration of computation and communication, and global dynamic load balancing [12][14]. - The first strategy involved optimizing parallel configurations, achieving a deployment that included 16 pipeline parallelism, 8 tensor parallelism, and 32 expert parallelism [15][16]. - The second strategy focused on releasing computing power at the single-node level, doubling the micro-batch size (MBS) and optimizing operator scheduling to fully utilize Ascend node capabilities [20][21]. Group 4: Reinforcement Learning Innovations - Huawei introduced the RL Fusion training and inference co-card technology, which supports flexible deployment modes and achieves a doubling of cluster utilization in post-training [28][29]. - The design of a semi-asynchronous mechanism, StaleSync, allows different tasks to execute in parallel while maintaining model accuracy, resulting in a 50% increase in overall training throughput [30]. Group 5: Performance Metrics and Future Prospects - The Pangu Ultra MoE model, with 718 billion parameters, demonstrated high performance during training, achieving a model utilization rate of 41% and a throughput of 35K Tokens/s in post-training [35][36]. - The system is designed to support ultra-large-scale clusters and models, with expectations for future iterations to achieve even higher utilization rates [35][36].