华为宣布 AI 推理技术重大突破有望彻底摆脱 HBM 依赖

Core Viewpoint - Huawei is set to unveil a breakthrough technology in AI inference at the "2025 Financial AI Inference Application Landing and Development Forum" on August 12, which aims to significantly reduce China's reliance on HBM (High Bandwidth Memory) and enhance the performance of domestic AI large models, addressing a critical gap in China's AI inference ecosystem [1][3]. Group 1: Current Market Context - The global demand for AI inference is experiencing explosive growth, while the core supporting technology, HBM, is monopolized by foreign giants, with over 90% dependency in high-end AI servers [3]. - The domestic replacement rate for HBM is less than 5%, leading to increased costs for large model training and inference, hindering AI application deployment in key sectors like finance, healthcare, and industry [3]. Group 2: Technological Innovations - Huawei's new technology targets the pain points by optimizing advanced storage-computing architecture and integrating DRAM with new storage technologies, aiming to maintain high inference efficiency while significantly reducing HBM usage [3]. - The technology may involve deep collaboration between "hardware reconstruction + software intelligence," potentially creating "super AI servers" that optimize computing, transportation, and storage capabilities [4]. Group 3: Performance Metrics - Huawei's CloudMatrix384 Ascend AI cloud service has validated similar technological pathways, achieving a single card decode throughput of over 1920 Tokens/s and a 10-fold increase in KV Cache transmission bandwidth, with a 50ms latency for each token output [4]. - The EMS elastic memory storage service has enabled significant reductions in NPU deployment numbers and inference latency, showcasing a 50% reduction in deployment and an 80% decrease in the first token inference latency [4]. Group 4: Industry Applications - The financial sector will be the first to implement Huawei's technology, which has already established a mature AI framework, supporting over 75% of major banks in core transformations [5]. - The technology will enhance native AI applications in finance, enabling millisecond-level decision-making in high-frequency trading and supporting real-time interactions for millions of users in intelligent customer service [5]. Group 5: Future Implications - While the ultra-high bandwidth characteristics of HBM (current mainstream HBM3 bandwidth exceeds 819GB/s) are difficult to fully replace in the short term, Huawei's technological approach offers new options for the industry [5]. - Experts suggest that if the technology can balance performance and cost, it may disrupt the industry's reliance on HBM and shift global AI chip development from "hardware stacking" to "architectural innovation" [5].