盘古Pro MoE大模型 - filings, earnings calls, financial reports, news

盘古Pro MoE大模型

Search documents

Guan Cha Zhe Wang· 2025-07-05 09:32

Core Viewpoint - Huawei's Pangu Pro MoE model has been recognized for its innovative design that utilizes dynamic activation of expert networks, achieving superior performance. However, a recent GitHub study claims that the model shares a "striking similarity" in parameter structure with Alibaba's Qwen-2.5 14B model [1]. Group 1: Model Development and Innovation - The Pangu Pro MoE model is developed and trained on the Ascend hardware platform and is not based on incremental training from other vendors' models. It features significant innovations in architecture and technical characteristics [2]. - The model introduces the Grouped Mixture of Experts (MoGE) architecture, which effectively addresses load balancing challenges in large-scale distributed training, thereby enhancing training efficiency [1][2]. Group 2: Open Source Compliance and Community Engagement - Huawei emphasizes that some foundational components of the Pangu Pro MoE model's code implementation reference industry open-source practices and include portions of open-source code from other models. The company adheres strictly to open-source license requirements and clearly marks copyright statements in the open-source code files [2]. - The company promotes an open innovation approach, respecting third-party intellectual property, and advocates for an inclusive, fair, open, united, and sustainable open-source philosophy [2]. - Huawei expresses gratitude to global developers and partners for their support of the Pangu model and highlights the importance of constructive feedback from the open-source community [2].

开源

大模型

Software and Internet

Software and Internet

盘古Pro MoE大模型

通义千问Qwen - 2.5 14B模型

华为，重大发布！

新华网财经· 2025-06-20 12:17

Core Viewpoint - Huawei's Pangu model has made significant advancements in various industries, demonstrating its capabilities in over 30 industries and 500 scenarios, with the latest Pangu model 5.5 set to enhance natural language processing and multimodal applications [1][4]. Group 1: Pangu Model Developments - The Pangu model has been successfully implemented in sectors such as government, finance, manufacturing, healthcare, coal mining, steel, railways, autonomous driving, and meteorology, showcasing its transformative impact [1]. - Huawei introduced the Pangu Ultra MoE model with a parameter scale of 718 billion, marking a significant leap in the training of ultra-large-scale models on the Ascend AI computing platform [1][2]. Group 2: Technical Innovations - The Pangu team has innovated in model architecture and training methods, achieving stable training of the ultra-large MoE model on the Ascend platform, utilizing over 18TB of data [2]. - Key innovations include the Depth-Scaled Sandwich-Norm (DSSN) architecture and TinyInit initialization method, which enhance stability and load balancing among experts [2][3]. Group 3: Performance Enhancements - The recent upgrades to the training system have improved the efficiency of the pre-training process, increasing the performance of the model from 30% to 41% in the multi-card cluster pre-training [3]. - The Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, has demonstrated performance comparable to models with over 100 billion parameters, ranking first among domestic models under 100 billion parameters [3]. Group 4: HarmonyOS Developments - Huawei unveiled HarmonyOS 6, which aims to enhance user experience with lower latency and improved AI capabilities, marking a significant step in the evolution of the Harmony ecosystem [4]. - The Harmony ecosystem is entering a new phase of acceleration, with over 30,000 applications and services in development across nearly 20 industries, highlighting a significant demand for talent in this area [5].

华为突破制裁的密码，藏在“384超节点”中

虎嗅APP· 2025-06-17 10:55

Core Viewpoint - The article discusses the challenges and strategies in achieving breakthroughs in artificial intelligence (AI) technology, particularly through the development of Huawei's "CloudMatrix 384 Super Node" computing cluster solution, which aims to overcome limitations in single-point technology by leveraging system engineering innovations [1][3]. Group 1: Huawei's Technological Advancements - Huawei's "CloudMatrix 384 Super Node" is built on 384 Ascend chips and can provide up to 300 PFLOPs of dense BF16 computing power, surpassing NVIDIA's B200 NVL 72 platform [3][4]. - The development of the "Super Node" reflects Huawei's foresight in addressing the diminishing returns of Moore's Law and the increasing costs associated with semiconductor advancements [4][9]. - The architecture of the "Super Node" features a fully interconnected high-speed bus system, enhancing communication bandwidth by 15 times and reducing latency significantly [8][9]. Group 2: System Engineering Innovations - Huawei's approach involves a comprehensive system-level redesign to address challenges in large-scale model training, focusing on resource allocation and communication efficiency [5][10]. - The implementation of global memory unified addressing allows for direct memory access across nodes, improving the efficiency of parameter synchronization during model training [8][9]. - The resource scheduling has been upgraded to enable dynamic task distribution based on model structure, optimizing computation and communication time [8][10]. Group 3: Collaborative Ecosystem Development - Huawei has mobilized a large team across various departments to enhance collaboration and innovation in AI infrastructure, showcasing a unique multi-industry cluster advantage [10][12]. - The company emphasizes the importance of ecosystem compatibility, ensuring that its Ascend architecture supports popular deep learning frameworks like PyTorch and TensorFlow [12][13]. - Huawei's commitment to improving the usability of its AI frameworks, such as MindSpore, aims to facilitate a smoother transition for developers accustomed to existing platforms [12][13]. Group 4: Future Prospects and Industry Impact - The advancements in Huawei's computing capabilities are positioned as a significant step for China's AI industry, potentially overcoming technological limitations and fostering innovation [12][13]. - The ongoing development of the Ascend ecosystem is expected to take time, but efforts are being made to enhance compatibility and support for developers [12][13]. - Huawei's recent achievements in large model training, including the Pangu Ultra MoE model, demonstrate the potential of its domestic computing platform to produce world-class AI models [10][12].

Guan Cha Zhe Wang· 2025-05-30 08:35

Core Insights - Huawei has launched a new model called Pangu Ultra MoE with a parameter scale of 718 billion, marking a significant advancement in MoE model training on the Ascend AI computing platform [1][3] - The Pangu team has innovated in model architecture and training methods to ensure stable training of ultra-large and highly sparse MoE models, overcoming challenges typically associated with such training processes [1][2] - The release of Pangu Ultra MoE and Pangu Pro MoE series models demonstrates Huawei's capability in achieving a fully autonomous training process with domestic computing power and models, reinforcing the innovation capacity of China's AI infrastructure [3] Model Architecture - The Pangu team introduced the Depth-Scaled Sandwich-Norm (DSSN) stable architecture and TinyInit initialization method, enabling long-term stable training with over 18TB of data on the Ascend platform [1] - The EP loss load optimization method was developed to maintain load balancing among experts and enhance their specialization capabilities [1] - The Pangu Ultra MoE employs advanced MLA and MTP architectures, utilizing a Dropless training strategy during both pre-training and post-training phases to balance model performance and efficiency [1] Training Methods - Huawei's team has disclosed key technologies that enable efficient integration of large sparse MoE reinforcement learning (RL) post-training frameworks on the Ascend CloudMatrix 384 supernodes, marking a transition to supernode cluster training [2] - Recent upgrades to the pre-training system have improved the performance of the MFU in a 10,000-card cluster from 30% to 41% [2] - The recently released Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, showcases excellent performance through innovative dynamic expert network activation, rivaling the performance of models with over 100 billion parameters [2]

华为盘古首次露出，昇腾原生72B MoE架构，SuperCLUE千亿内模型并列国内第一

华尔街见闻· 2025-05-29 00:57

Core Insights - The emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team addresses the inefficiencies of traditional Mixture of Experts (MoE) models, ensuring balanced computational load across devices while maintaining high performance [1][7][27] - The Pangu Pro MoE model, with 72 billion total parameters and 16 billion active parameters, achieves competitive performance in the industry, ranking first among models with less than 100 billion parameters in China [2][22] Group 1: Model Architecture and Efficiency - The MoGE architecture introduces a grouping mechanism that ensures balanced expert activation, significantly improving computational efficiency and reducing system bottlenecks [1][6][12] - The model demonstrates superior throughput, achieving 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, outperforming similar-sized dense models [18][26] Group 2: Performance Metrics - In the latest SuperCLUE ranking, Pangu Pro MoE scored 58.75, showcasing its strong capabilities in various reasoning tasks and outperforming other models in complex reasoning scenarios [3][22] - The model exhibits excellent performance across multiple benchmarks, including English and Chinese language tasks, demonstrating its versatility and adaptability in complex cognitive tasks [22][23][24] Group 3: Industry Impact - The introduction of Pangu Pro MoE signifies a shift in the AI industry from a focus on parameter quantity to practical application, enabling efficient cloud inference and supporting high-concurrency real-time scenarios [27] - Huawei's innovations in the MoE architecture redefine the value of large models, providing a robust foundation for AI applications across various industries [27]

混合专家模型（MoE）

分组混合专家模型（MoGE）

Artificial Intelligence

盘古Pro MoE大模型

混合专家模型（MoE）

分组混合专家模型（MoGE）

Artificial Intelligence

盘古Pro MoE大模型

华为盘古大模型首次打榜：昇腾原生 72B MoE 模型登顶 SuperCLUE 千亿内模型榜首

第一财经· 2025-05-28 13:36

Core Viewpoint - The article highlights the emergence of the Mixture of Grouped Experts (MoGE) model developed by Huawei's Pangu team as a significant innovation in the AI field, particularly in large language models (LLMs), addressing the challenges of traditional Mixture of Experts (MoE) architectures and achieving efficient training and performance [1][10][31]. Group 1: MoGE Architecture and Innovations - The MoGE model introduces a dynamic grouping mechanism during the expert selection phase, optimizing load distribution and enabling balanced resource allocation across devices, thus overcoming the engineering bottlenecks of traditional MoE architectures [1][10]. - The Pangu Pro MoE model, based on the MoGE architecture, has a total parameter count of 72 billion, with 16 billion active parameters, achieving industry-leading inference efficiency on Ascend 300I Duo and 800I A2 chips, reaching 321 tokens/s and 1528 tokens/s respectively [2][22]. - Compared to other models like DeepSeek-R1, which has 671 billion parameters, the Pangu Pro MoE achieves comparable performance with only 1/10th the parameter count, setting a new benchmark for computational efficiency and model effectiveness [3][29]. Group 2: Performance and Benchmarking - The Pangu Pro MoE scored 59 points on the SuperCLUE benchmark, ranking it first among domestic models with less than 100 billion parameters, demonstrating its capability to rival larger models in performance [2][25]. - The model exhibits superior performance in various complex reasoning tasks, outperforming other leading models in benchmarks such as MMLU and DROP, showcasing its versatility across different domains [26][27]. Group 3: Industry Implications and Future Directions - The introduction of the MoGE architecture signifies a shift from a parameter-centric approach to a focus on practical efficiency, enabling smaller enterprises to leverage large models without prohibitive costs, thus democratizing access to advanced AI technologies [31][32]. - Huawei's integrated approach, combining architecture, chips, and engines, facilitates the deployment of large models in real-world applications, breaking the misconception that large models require exorbitant deployment costs [31][32].

混合专家模型

分组混合专家模型

Artificial Intelligence

盘古Pro MoE大模型

混合专家模型

分组混合专家模型

Artificial Intelligence

盘古Pro MoE大模型

首次打榜就登顶，华为盘古如何以小胜大？

虎嗅APP· 2025-05-28 13:34

Core Viewpoint - The article discusses Huawei's innovative Mixture of Grouped Experts (MoGE) architecture, which optimizes the traditional Mixture of Experts (MoE) model to enhance load balancing and computational efficiency in AI applications, particularly in large models [1][2][6]. Summary by Sections Introduction - The MoE model has evolved from its academic origins to become a competitive force in AI, with Huawei's MoGE architecture representing a significant advancement in this field [1]. MoGE Architecture - Huawei's Pangu Pro MoE model features 72 billion total parameters and 16 billion active parameters, achieving superior expert load distribution and computational efficiency [2]. - The model's performance is highlighted by its ranking in the SuperCLUE leaderboard, where it achieved a score of 59, placing it among the top domestic models with fewer parameters compared to competitors [2]. Technical Innovations - The MoGE architecture addresses the core challenge of load imbalance in traditional MoE models by implementing a grouped balanced routing mechanism, ensuring equal activation of experts within defined groups [6][12]. - This design leads to improved throughput and dynamic scalability, making it suitable for various applications [12]. Performance Metrics - The Pangu Pro MoE model demonstrates significant improvements in inference performance, achieving up to 321 tokens per second on the Ascend 300I Duo platform and 1528 tokens per second on the Ascend 800I A2 platform [16]. - The model's capabilities extend across multiple domains, showcasing strong performance in reasoning tasks and cross-language benchmarks [17][18]. Practical Applications - The introduction of Pangu Pro MoE signifies a shift from a focus on parameter quantity to practical effectiveness, enabling enterprises to leverage large models efficiently in real-time scenarios [23]. - Huawei aims to redefine the value of large models, providing a robust foundation for AI applications across various industries [23].

华为盘古首次露出，昇腾原生72B MoE架构，SuperCLUE千亿内模型并列国内第一

机器之心· 2025-05-28 08:09

Core Insights - The article discusses the emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team, which addresses the inefficiencies of traditional Mixture of Experts (MoE) models by ensuring balanced computational load across devices [2][6][31] - Pangu Pro MoE, built on the MoGE architecture, has demonstrated superior performance in industry benchmarks, achieving a score of 59 on the SuperCLUE leaderboard with only 72 billion parameters, making it competitive against larger models [3][26] Technical Innovations - The MoGE model introduces a grouping mechanism during the expert selection phase, which ensures that each token activates an equal number of experts within predefined groups, thus achieving load balancing across devices [2][12] - The architecture utilizes a batch-level auxiliary loss function to maintain balanced expert activation, enhancing overall model efficiency [16][18] Performance Metrics - Pangu Pro MoE achieves a throughput of 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, significantly outperforming other models of similar scale [24] - The model exhibits a nearly uniform expert load distribution, with each expert handling approximately 12.5% of the total token volume, indicating efficient resource utilization [29] Industry Impact - The introduction of Pangu Pro MoE signifies a shift from a "parameter arms race" to a focus on practical applications, reducing cloud inference costs and supporting high-concurrency real-time scenarios [31] - Huawei's innovations in the AI field aim to redefine the value of large models, providing a robust foundation for enterprises to deploy billion-parameter models effectively [31]