EMS弹性内存存储服务

Search documents
华为云再掀算力风暴:CloudMatrix384超节点将升级,Tokens服务性能最大可超H20四倍
量子位· 2025-09-19 04:11
明敏 发自 凹非寺 量子位 | 公众号 QbitAI 华为云算力再迎重大突破! 刚刚落幕的华为全联接大会2025,一系列新进展发布—— 这距离CloudMatrix384超节点2025年4月正式发布仅半年,期间其 能力持续进化 : 现阶段, AI行业内依旧被算力焦虑笼罩 。硅谷大厂近期在算力、芯片领域动作频频: OpenAI一边和博通自研AI芯片,一边向甲骨文抛出3000亿美元买算力;马斯克百天建成万卡超算集群,还计划向百万卡规模冲击,同时悄悄 布局芯片;Meta、AWS等企业也在积极获取更多算力资源……但算力的发展并非一蹴而就,它需要在单点技术上极致突破,还涉及芯片、硬 件、架构、软件、网络、能源乃至整个产业生态的协同演进。 放眼全球,能够输出澎湃算力的供应商,都离不开十数年、数十年的沉淀积累。 华为云作为其中一员,探索路径因所处产业阶段而显得尤为深刻:不仅需要在技术"无人区"重新定义算力运行规则;还需把握AI发展时机,通 过快速迭代响应产业海量需求。一步步成长为今天的"算力黑土地"。 AI算力云服务升级, 基于华为云刚刚发布的最新AI服务器规划, CloudMatrix的云上超节点规格将从384卡升级到未 ...
华为宣布 AI 推理技术重大突破 有望彻底摆脱 HBM 依赖
是说芯语· 2025-08-10 02:30
Core Viewpoint - Huawei is set to unveil a breakthrough technology in AI inference at the "2025 Financial AI Inference Application Landing and Development Forum" on August 12, which aims to significantly reduce China's reliance on HBM (High Bandwidth Memory) and enhance the performance of domestic AI large models, addressing a critical gap in China's AI inference ecosystem [1][3]. Group 1: Current Market Context - The global demand for AI inference is experiencing explosive growth, while the core supporting technology, HBM, is monopolized by foreign giants, with over 90% dependency in high-end AI servers [3]. - The domestic replacement rate for HBM is less than 5%, leading to increased costs for large model training and inference, hindering AI application deployment in key sectors like finance, healthcare, and industry [3]. Group 2: Technological Innovations - Huawei's new technology targets the pain points by optimizing advanced storage-computing architecture and integrating DRAM with new storage technologies, aiming to maintain high inference efficiency while significantly reducing HBM usage [3]. - The technology may involve deep collaboration between "hardware reconstruction + software intelligence," potentially creating "super AI servers" that optimize computing, transportation, and storage capabilities [4]. Group 3: Performance Metrics - Huawei's CloudMatrix384 Ascend AI cloud service has validated similar technological pathways, achieving a single card decode throughput of over 1920 Tokens/s and a 10-fold increase in KV Cache transmission bandwidth, with a 50ms latency for each token output [4]. - The EMS elastic memory storage service has enabled significant reductions in NPU deployment numbers and inference latency, showcasing a 50% reduction in deployment and an 80% decrease in the first token inference latency [4]. Group 4: Industry Applications - The financial sector will be the first to implement Huawei's technology, which has already established a mature AI framework, supporting over 75% of major banks in core transformations [5]. - The technology will enhance native AI applications in finance, enabling millisecond-level decision-making in high-frequency trading and supporting real-time interactions for millions of users in intelligent customer service [5]. Group 5: Future Implications - While the ultra-high bandwidth characteristics of HBM (current mainstream HBM3 bandwidth exceeds 819GB/s) are difficult to fully replace in the short term, Huawei's technological approach offers new options for the industry [5]. - Experts suggest that if the technology can balance performance and cost, it may disrupt the industry's reliance on HBM and shift global AI chip development from "hardware stacking" to "architectural innovation" [5].