Workflow
英伟达H800
icon
Search documents
华为芯片,让英伟达黄教主坐不住了
21世纪经济报道· 2025-07-07 08:56
Core Viewpoint - Huawei's Ascend CloudMatrix 384 super node has demonstrated performance that surpasses NVIDIA's products in certain aspects, indicating a significant advancement in domestic AI chip capabilities [1][11][13]. Group 1: Huawei's Ascend Chip Overview - Ascend is a dedicated AI processing chip (NPU) designed specifically for AI tasks, with the Ascend 910 being its main product [3][6]. - Previously, Ascend chips were used as backup options due to the unavailability of high-end NVIDIA and AMD chips, but they have now emerged as leaders in the domestic chip market [3][6]. - The Ascend chips have primarily been utilized in AI inference tasks, with limited use in model training due to performance and ecosystem limitations [4][6]. Group 2: Performance and Capabilities - In 2024 and 2025, Huawei transformed Ascend from a backup option to a primary player capable of training large models, achieving significant results documented in research papers [5][6]. - Ascend has successfully trained models with 135 billion parameters using 8192 chips and 718 billion parameters using over 6000 chips, showcasing the ability to train large-scale models with domestic chips [6][10]. - Key performance indicators such as MFU (Modeling Function Utilization) reached over 50% for the dense model and 41% for the MoE model, indicating high efficiency in resource utilization [9][10]. Group 3: Competitive Comparison with NVIDIA - In direct comparisons, Ascend's 384 super node demonstrated comparable performance to NVIDIA's H100 and H800 in real-world applications, achieving the best utilization rates [11][12]. - Although a single Ascend chip's performance is only one-third of NVIDIA's Blackwell, the overall system performance of the 384 super node exceeds NVIDIA's GB200 due to the higher number of chips used [13][21]. - This indicates that Ascend is not just a replacement but has the potential to lead in certain performance metrics [13]. Group 4: Technological Innovations - The CloudMatrix 384 super node consists of 384 Ascend 910 chips and 192 Kunpeng CPUs, interconnected using advanced optical communication technology, which enhances data transmission efficiency [16][30]. - Huawei's approach focuses on a system-level engineering breakthrough rather than relying on single-chip performance, utilizing a combination of communication, optical, thermal, and software innovations [21][22]. - The architecture allows for high-speed, peer-to-peer communication among chips, significantly improving data transfer rates compared to traditional copper connections used by competitors [28][30]. Group 5: Market Position and Future Outlook - Despite still trailing behind NVIDIA in chip technology and software ecosystem, Huawei's Ascend has gained traction in the Chinese market, especially as companies adapt to domestic chips due to restrictions on NVIDIA products [36][38]. - The domestic semiconductor industry is evolving under pressure, with Huawei's strategy representing a unique "technology curve" that prioritizes system optimization over individual chip performance [38][39]. - The advancements made by Ascend may signify the beginning of a significant shift in the AI computing landscape, positioning domestic capabilities for a potential resurgence in the global market [40].
华为芯片,究竟有多牛?(上)
Core Viewpoint - Huawei's Ascend 384 Super Node has demonstrated performance that surpasses NVIDIA's products in certain aspects, indicating a significant advancement in domestic AI chip capabilities [2][3]. Group 1: Product Overview - Ascend is an AI chip developed by Huawei, specifically designed for AI tasks as an NPU, distinguishing it from traditional GPUs and CPUs [4]. - The main product, Ascend 910, has transitioned from being a backup option to a primary solution for training large models due to restrictions on high-end chips from NVIDIA and AMD [4][6]. Group 2: Performance Metrics - In recent developments, Huawei has successfully trained large models using Ascend chips, achieving a dense model with 135 billion parameters and a MoE model with 718 billion parameters [6]. - The key performance indicator, MFU (Modeling Function Utilization), reached over 50% for the dense model and 41% for the MoE model, indicating efficient utilization of computational resources [9]. Group 3: Competitive Analysis - In a direct comparison with NVIDIA's H100 and H800 during the deployment of large models, Ascend demonstrated comparable performance, achieving the best utilization rate in the competition [10]. - Although a single Ascend chip's performance is only one-third of NVIDIA's Blackwell, the 384 Super Node configuration, which utilizes five times the number of chips, results in an overall computational power that exceeds NVIDIA's GB200 [10].