Workflow
昇腾NPU
icon
Search documents
华为首个开源大模型来了!Pro MoE 720亿参数,4000颗昇腾训练
Hua Er Jie Jian Wen· 2025-06-30 07:27
Core Insights - Huawei has announced the open-sourcing of its Pangu models, including the 70 billion parameter dense model and the 720 billion parameter mixture of experts (MoE) model, marking a significant step in the domestic large model open-source competition [1][3][20] Model Performance - The Pangu Pro MoE model achieves a single-card inference throughput of 1148 tokens/s on the Ascend 800I A2, which can be further enhanced to 1528 tokens/s using speculative acceleration technology, outperforming similar-sized dense models [3][11] - The Pangu Pro MoE model is built on the MoGE architecture, with a total parameter count of 720 billion and an active parameter count of 160 billion, optimized specifically for Ascend hardware [4][11] Training and Evaluation - Huawei utilized 4000 Ascend NPUs for pre-training on a high-quality corpus of 13 trillion tokens, divided into general, inference, and annealing phases to progressively enhance model capabilities [11] - The Pangu Pro MoE model has demonstrated superior performance in various benchmarks, including achieving a score of 91.2 in the DROP benchmark, closely matching the best current models [12][14] Competitive Landscape - The open-sourcing of Pangu models coincides with a wave of domestic AI model releases, with leading companies like MiniMax and Alibaba also upgrading their open-source models, leading to a price reduction of 60%-80% for large models [3][20] - The Pangu Pro MoE model ranks fifth in the SuperCLUE Chinese large model benchmark, surpassing several existing models and indicating its competitive position in the market [17][18] Technological Integration - Huawei's ecosystem, integrating chips (Ascend NPU), frameworks (MindSpore), and models (Pangu), represents a significant technological achievement, providing a viable high-performance alternative to Nvidia's dominance in the industry [20]
训练大模型,终于可以“既要又要还要”了
虎嗅APP· 2025-05-29 10:34
HUAWEI X HUXIU 三分之一个世纪前,加拿大学者们提出了经典的MoE模型神经网络结构,在人类探索AI的 「石器时代」中,为后世留下了变革的火种。 近十年前,美国硅谷的互联网巨擎在理论和工程等方面,突破了MoE模型的原始架构,让这 个原本被置于学术高阁的理念,化身成为了随后AI竞争的导火索。 如今,后发优势再一次来到了大洋此岸,以华为为代表的中国科技企业,纷纷提出对MoE架 构的优化重组方案。尤其是华为的MoGE架构,不仅克服了MoE负载不均衡及效率瓶颈的弊 病,还能够降本增效,便于训练和部署。 AI之战远未终结,但正如在其他领域中「多快好省」的中国产业底色一样,大模型这棵生于 西方长于彼岸的科技树,也同样会被东方智慧经手后,进化为更加普适和亲切的工具。 近期,虎嗅将打造《华为技术披露集》系列内容,通过一连串的技术报告,首次全面披露相 关的技术细节。 希望本系列内容能为业界起到参考价值,也希望更多人能与华为一起,共同打造长期持续的 开放协作生态环境,让昇腾生态在中国茁壮成长。 《华为技术披露集》系列 VOL.7 :模型训练 Pangu Ultra MoE是一个全流程在昇腾NPU上训练的准万亿MoE模型 ...
Bye,英伟达!华为NPU,跑出了准万亿参数大模型
量子位· 2025-05-08 04:04
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 现在,跑准 万亿参数 的大模型,可以彻底跟英伟达Say Goodbye了。 例如负载均衡难、通信开销大、训练效率低等等。 而华为盘古团队(包含诺亚方舟实验室、华为云等)基于 昇腾国产算力平台 ,一举攻破了上述所有的挑战—— 6000+块昇腾NPU集群上完成了 7180亿(718B)参数MoE模型 的长期稳定训练,并通过多项突破性系统优化技术实现了显著性能提升。 这些创新大幅提高了训练效率,支撑了行业顶尖水平模型的开发! 不得不说,"国产"二字在大模型硬件上的含金量还在持续上升。 纯国产NPU,丝滑跑通准万亿参数大模型 在拆解华为一系列"黑科技"之前,我们先需要更深入地了解一下训练超大参数MoE模型背后的困难。 完成此举的,正是 华为! 技术报告:arxiv.org/abs/2505.04519 要知道,在此之前,训练万亿参数大模型这事,是有诸多"拦路虎"在身上的。 总体来看,在这条路上有"四大金刚"在严阵把守。 首先就是 架构参数优化难题 ,需在众多参数组合中探索最优配置,设计适配昇腾NPU的大规模MoE架构,实现计算资源的高效利用。 其次是 动态负载均衡 ...
中科大华为发布生成式推荐大模型,昇腾NPU可部署,背后认知一同公开
量子位· 2025-04-06 02:33
Core Viewpoint - The article discusses the emergence of generative recommendation models, particularly the HSTU framework, which has shown significant advancements in the recommendation system landscape, especially with the successful deployment on domestic Ascend NPU [1][4][5]. Group 1: Development of Generative Recommendation Models - The generative recommendation paradigm, characterized by the expansion law, is becoming a future trend in recommendation systems [4][6]. - The evolution of recommendation systems has shifted from manual feature engineering to complex model designs, and now back to focusing on feature engineering due to the limitations of deep learning capabilities [5][6]. - The success of large language models has inspired researchers in the recommendation field to explore scalable models that can enhance recommendation effectiveness [5][6]. Group 2: Performance Analysis of Different Architectures - A comparative analysis of HSTU, Llama, GPT, and SASRec revealed that HSTU and Llama significantly outperform others in scalability as model parameters increase, while GPT and SASRec show limited scalability in recommendation tasks [7][9]. - HSTU consistently outperformed baseline models like SASRec in multi-domain scenarios, demonstrating its potential in addressing cold start problems [13]. Group 3: Key Components and Their Impact - The removal of the Relative Attention Bias (RAB) from HSTU led to a noticeable decline in performance, indicating its critical role in the model's scalability [9][11]. - Modifications to the residual connection and the introduction of RAB to SASRec improved its scalability, highlighting the importance of these components in enhancing traditional recommendation models [11][12]. Group 4: Future Directions - The report identifies potential research directions for generative recommendation models, including data engineering, tokenizer efficiency, and training inference efficiency, which could help address current challenges and expand application scenarios [18].