EMS弹性内存存储服务
Search documents
华为云再掀算力风暴:CloudMatrix384超节点将升级,Tokens服务性能最大可超H20四倍
量子位· 2025-09-19 04:11
Core Viewpoint - Huawei Cloud has made significant advancements in AI computing power, positioning itself as a key player in the industry amidst growing demand for computational resources driven by AI applications [1][4]. Group 1: Technological Advancements - The CloudMatrix384 super node, launched in April 2025, has evolved to enhance its capabilities, addressing the ongoing "computing power anxiety" in the AI industry [3][6]. - Huawei Cloud's AI server planning includes an upgrade of CloudMatrix specifications from 384 cards to a future capacity of 8192 cards, enabling the formation of massive AI clusters [5][19]. - The introduction of the EMS elastic memory storage service significantly reduces latency during multi-turn dialogues, enhancing overall performance [5][19]. Group 2: Market Positioning - Huawei Cloud's "computing power black land" concept provides fertile ground for enterprises and developers to innovate in AI, supported by a robust ecosystem of technological advancements [7][28]. - The strategy of combining intelligent computing (Tokens service) and general computing (Kunpeng cloud services) allows Huawei Cloud to meet diverse industry needs [9][11]. Group 3: Tokens Service - The Tokens service, based on the CloudMatrix384 super node, offers a new billing model that charges based on actual token consumption, significantly lowering AI inference costs [14][16]. - The daily average token consumption in China surged from 100 billion to over 30 trillion within a year and a half, indicating a dramatic increase in demand for AI computational resources [15]. Group 4: Industry Applications - Huawei Cloud's infrastructure supports various applications, including high-precision scientific research and AI-driven internet applications, demonstrating its versatility and capability to handle complex tasks [21][23]. - The collaboration with national research institutions, such as the Chinese Academy of Sciences, highlights Huawei Cloud's commitment to providing reliable and high-performance computing resources for advanced scientific models [25].
华为宣布 AI 推理技术重大突破 有望彻底摆脱 HBM 依赖
是说芯语· 2025-08-10 02:30
Core Viewpoint - Huawei is set to unveil a breakthrough technology in AI inference at the "2025 Financial AI Inference Application Landing and Development Forum" on August 12, which aims to significantly reduce China's reliance on HBM (High Bandwidth Memory) and enhance the performance of domestic AI large models, addressing a critical gap in China's AI inference ecosystem [1][3]. Group 1: Current Market Context - The global demand for AI inference is experiencing explosive growth, while the core supporting technology, HBM, is monopolized by foreign giants, with over 90% dependency in high-end AI servers [3]. - The domestic replacement rate for HBM is less than 5%, leading to increased costs for large model training and inference, hindering AI application deployment in key sectors like finance, healthcare, and industry [3]. Group 2: Technological Innovations - Huawei's new technology targets the pain points by optimizing advanced storage-computing architecture and integrating DRAM with new storage technologies, aiming to maintain high inference efficiency while significantly reducing HBM usage [3]. - The technology may involve deep collaboration between "hardware reconstruction + software intelligence," potentially creating "super AI servers" that optimize computing, transportation, and storage capabilities [4]. Group 3: Performance Metrics - Huawei's CloudMatrix384 Ascend AI cloud service has validated similar technological pathways, achieving a single card decode throughput of over 1920 Tokens/s and a 10-fold increase in KV Cache transmission bandwidth, with a 50ms latency for each token output [4]. - The EMS elastic memory storage service has enabled significant reductions in NPU deployment numbers and inference latency, showcasing a 50% reduction in deployment and an 80% decrease in the first token inference latency [4]. Group 4: Industry Applications - The financial sector will be the first to implement Huawei's technology, which has already established a mature AI framework, supporting over 75% of major banks in core transformations [5]. - The technology will enhance native AI applications in finance, enabling millisecond-level decision-making in high-frequency trading and supporting real-time interactions for millions of users in intelligent customer service [5]. Group 5: Future Implications - While the ultra-high bandwidth characteristics of HBM (current mainstream HBM3 bandwidth exceeds 819GB/s) are difficult to fully replace in the short term, Huawei's technological approach offers new options for the industry [5]. - Experts suggest that if the technology can balance performance and cost, it may disrupt the industry's reliance on HBM and shift global AI chip development from "hardware stacking" to "architectural innovation" [5].