昇腾CloudMatrix 384超节点

Search documents
华为芯片,究竟有多牛?(上)
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-06 03:12
Core Viewpoint - Huawei's Ascend 384 Super Node has demonstrated performance that surpasses NVIDIA's products in certain aspects, indicating a significant advancement in domestic AI chip capabilities [2][3]. Group 1: Product Overview - Ascend is an AI chip developed by Huawei, specifically designed for AI tasks as an NPU, distinguishing it from traditional GPUs and CPUs [4]. - The main product, Ascend 910, has transitioned from being a backup option to a primary solution for training large models due to restrictions on high-end chips from NVIDIA and AMD [4][6]. Group 2: Performance Metrics - In recent developments, Huawei has successfully trained large models using Ascend chips, achieving a dense model with 135 billion parameters and a MoE model with 718 billion parameters [6]. - The key performance indicator, MFU (Modeling Function Utilization), reached over 50% for the dense model and 41% for the MoE model, indicating efficient utilization of computational resources [9]. Group 3: Competitive Analysis - In a direct comparison with NVIDIA's H100 and H800 during the deployment of large models, Ascend demonstrated comparable performance, achieving the best utilization rate in the competition [10]. - Although a single Ascend chip's performance is only one-third of NVIDIA's Blackwell, the 384 Super Node configuration, which utilizes five times the number of chips, results in an overall computational power that exceeds NVIDIA's GB200 [10].
科创板迎硬核玩家:沐曦IPO获受理 ,国产GPU上市提速
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-01 12:52
Core Viewpoint - The rise of domestic AI chip companies, particularly Muxi Integrated Circuit (Shanghai) Co., Ltd., is accelerating their entry into the capital market, with Muxi's IPO on the Sci-Tech Innovation Board being a significant event in the GPU sector [1][2]. Company Overview - Muxi aims to raise 3.904 billion yuan for the development and industrialization of next-generation general-purpose GPUs, AI inference chips, and advanced heterogeneous computing architectures [1]. - Founded in 2020, Muxi is part of the "Four Little Dragons" of domestic GPUs, alongside companies like Moore Threads and Birran Technology [2]. - Muxi's flagship product, the "Xiyun C series," is a self-developed GPU chip that has achieved significant sales and application in AI public computing platforms [3]. Financial Performance - Muxi's revenue projections for 2022 to 2024 are 426,000 yuan, 53.021 million yuan, and 743 million yuan, respectively, indicating substantial growth [3]. - Despite revenue growth, Muxi is facing significant net losses projected at 780 million yuan, 870 million yuan, and 1.41 billion yuan over the same period, totaling 3.06 billion yuan in losses [3]. Market Context - The domestic AI chip market is still in its early stages, with increasing penetration rates for local brands, but lacking a clear competitive landscape [1]. - The rise of domestic GPU manufacturers is driven by the growth of AI models, the "East Data West Computing" initiative, and ongoing policies promoting domestic innovation [5]. - By 2025, domestic AI chips are expected to account for 40% of the AI server market in China, while NVIDIA's share is projected to decrease to 41.5% [7]. Policy Environment - The recent reforms in the Sci-Tech Innovation Board have created a more favorable environment for unprofitable but technologically advanced companies like Muxi, signaling a shift towards supporting "hard tech" enterprises [4].
华为突破制裁的密码,藏在“384超节点”中
虎嗅APP· 2025-06-17 10:55
Core Viewpoint - The article discusses the challenges and strategies in achieving breakthroughs in artificial intelligence (AI) technology, particularly through the development of Huawei's "CloudMatrix 384 Super Node" computing cluster solution, which aims to overcome limitations in single-point technology by leveraging system engineering innovations [1][3]. Group 1: Huawei's Technological Advancements - Huawei's "CloudMatrix 384 Super Node" is built on 384 Ascend chips and can provide up to 300 PFLOPs of dense BF16 computing power, surpassing NVIDIA's B200 NVL 72 platform [3][4]. - The development of the "Super Node" reflects Huawei's foresight in addressing the diminishing returns of Moore's Law and the increasing costs associated with semiconductor advancements [4][9]. - The architecture of the "Super Node" features a fully interconnected high-speed bus system, enhancing communication bandwidth by 15 times and reducing latency significantly [8][9]. Group 2: System Engineering Innovations - Huawei's approach involves a comprehensive system-level redesign to address challenges in large-scale model training, focusing on resource allocation and communication efficiency [5][10]. - The implementation of global memory unified addressing allows for direct memory access across nodes, improving the efficiency of parameter synchronization during model training [8][9]. - The resource scheduling has been upgraded to enable dynamic task distribution based on model structure, optimizing computation and communication time [8][10]. Group 3: Collaborative Ecosystem Development - Huawei has mobilized a large team across various departments to enhance collaboration and innovation in AI infrastructure, showcasing a unique multi-industry cluster advantage [10][12]. - The company emphasizes the importance of ecosystem compatibility, ensuring that its Ascend architecture supports popular deep learning frameworks like PyTorch and TensorFlow [12][13]. - Huawei's commitment to improving the usability of its AI frameworks, such as MindSpore, aims to facilitate a smoother transition for developers accustomed to existing platforms [12][13]. Group 4: Future Prospects and Industry Impact - The advancements in Huawei's computing capabilities are positioned as a significant step for China's AI industry, potentially overcoming technological limitations and fostering innovation [12][13]. - The ongoing development of the Ascend ecosystem is expected to take time, but efforts are being made to enhance compatibility and support for developers [12][13]. - Huawei's recent achievements in large model training, including the Pangu Ultra MoE model, demonstrate the potential of its domestic computing platform to produce world-class AI models [10][12].
华为揭秘:国产昇腾训出世界一流大模型
Guan Cha Zhe Wang· 2025-05-30 08:35
Core Insights - Huawei has launched a new model called Pangu Ultra MoE with a parameter scale of 718 billion, marking a significant advancement in MoE model training on the Ascend AI computing platform [1][3] - The Pangu team has innovated in model architecture and training methods to ensure stable training of ultra-large and highly sparse MoE models, overcoming challenges typically associated with such training processes [1][2] - The release of Pangu Ultra MoE and Pangu Pro MoE series models demonstrates Huawei's capability in achieving a fully autonomous training process with domestic computing power and models, reinforcing the innovation capacity of China's AI infrastructure [3] Model Architecture - The Pangu team introduced the Depth-Scaled Sandwich-Norm (DSSN) stable architecture and TinyInit initialization method, enabling long-term stable training with over 18TB of data on the Ascend platform [1] - The EP loss load optimization method was developed to maintain load balancing among experts and enhance their specialization capabilities [1] - The Pangu Ultra MoE employs advanced MLA and MTP architectures, utilizing a Dropless training strategy during both pre-training and post-training phases to balance model performance and efficiency [1] Training Methods - Huawei's team has disclosed key technologies that enable efficient integration of large sparse MoE reinforcement learning (RL) post-training frameworks on the Ascend CloudMatrix 384 supernodes, marking a transition to supernode cluster training [2] - Recent upgrades to the pre-training system have improved the performance of the MFU in a 10,000-card cluster from 30% to 41% [2] - The recently released Pangu Pro MoE model, with 72 billion parameters and 16 billion active parameters, showcases excellent performance through innovative dynamic expert network activation, rivaling the performance of models with over 100 billion parameters [2]
910C的下一代
信息平权· 2025-04-20 09:33
Core Viewpoint - Huawei's CloudMatrix 384 super node claims to rival Nvidia's NVL72, but there are discrepancies in the hardware descriptions and capabilities between CloudMatrix and the UB-Mesh paper, suggesting they may represent different hardware forms [1][2][8]. Group 1: CloudMatrix vs. UB-Mesh - CloudMatrix is described as a commercial 384 NPU scale-up super node, while UB-Mesh outlines a plan for an 8000 NPU scale-up super node [8]. - The UB-Mesh paper indicates a different architecture for the next generation of NPUs, potentially enhancing capabilities beyond the current 910C model [10][11]. - There are significant differences in the number of NPUs per rack, with CloudMatrix having 32 NPUs per rack compared to UB-Mesh's 64 NPUs per rack [1]. Group 2: Technical Analysis - CloudMatrix's total power consumption is estimated at 500KW, significantly higher than NVL72's 145KW, raising questions about its energy efficiency [2]. - The analysis of optical fiber requirements for CloudMatrix suggests that Huawei's vertical integration may mitigate costs and power consumption concerns associated with fiber optics [3][4]. - The UB-Mesh paper proposes a multi-rack structure using electrical connections within racks and optical connections between racks, which could optimize deployment and reduce complexity [9]. Group 3: Market Implications - The competitive landscape may shift if Huawei successfully develops a robust AI hardware ecosystem, potentially challenging Nvidia's dominance in the market [11]. - The ongoing development of AI infrastructure in China could lead to a new competitive environment, especially with the emergence of products like DeepSeek [11][12]. - The perception of optical modules and their cost-effectiveness may evolve, similar to the trajectory of laser radar technology in the automotive industry [6].