AI推理加速
Search documents
国泰海通|电子:昇腾推理加速套件正式开源,昇腾芯片渗透加速
国泰海通证券研究· 2025-12-30 14:28
Core Insights - Huawei's Ascend multi-modal inference acceleration suite, MindIE SD, has officially been open-sourced, which is expected to enhance inference efficiency and increase the penetration rate of Ascend chips [1][2] - A joint AI inference acceleration solution was launched by Huawei Data Storage and Zhongke Hongyun, which is anticipated to further boost the market presence of Ascend chips [3] Group 1: Ascend Multi-modal Inference Acceleration Suite - The MindIE SD project includes four key acceleration features: 1. Acceleration plugins that reduce computation and memory access overhead 2. Cache algorithms that significantly improve model runtime performance 3. Multi-card parallelism that allows developers to enable parallel processing with simple interface replacements 4. Quantization and sparse attention algorithms tailored for Ascend hardware to enhance inference efficiency while minimizing resource consumption [2] Group 2: AI Inference Acceleration Joint Solution - The joint solution supports heterogeneous management, enabling collaborative interaction between platforms, computing, and storage, and is compatible with mainstream frameworks like MindSpore, vLLM, and SGLang - It features fine-grained scheduling of computing resources for maximum utilization and leverages Huawei's UCM technology to persist KV Cache in Huawei OceanStor A series storage, reducing redundant computations - In practical tests, the solution achieved a 57.5% reduction in first token latency in typical inference scenarios, with significant improvements in long document inference, where an 86% increase in concurrency and a 36% boost in throughput were observed when sequence length reached 39K [3] Group 3: Chip Validation and Orders - The Ascend 950PR chip has passed validation, and cloud vendors are increasing their orders for Ascend chips, indicating growing demand in the market [4]
HBM价格暴涨之际,华为开源AI推理加速关键技术
Guan Cha Zhe Wang· 2025-11-06 03:10
Core Viewpoint - The price of High Bandwidth Memory (HBM) is expected to rise significantly, with SK Hynix confirming that the price for HBM4 will be approximately $560, over 50% higher than the current HBM3E price of around $370. This price increase raises concerns about dependency on high-end HBM, especially amid export controls affecting China. Huawei's newly open-sourced Unified Cache Manager (UCM) technology may provide a solution to mitigate this dependency by optimizing data management across different storage mediums [1][5]. Group 1: HBM Market Dynamics - SK Hynix leads the global HBM market with a 62% shipment share, followed by Micron Technology at 21% and Samsung Electronics at 17% [4]. - HBM4, the sixth generation of HBM, features a 2048-bit interface and up to 16 layers of stacking, targeting bandwidth exceeding 2 TB/s and capacity of 64GB [4]. - The integration of HBM directly with processors, including potential use of photonic technology, is being explored to enhance speed and efficiency [4]. Group 2: Huawei's Technological Innovations - Huawei's UCM technology allows for tiered caching of memory data based on usage frequency, which can optimize the efficiency of HBM and reduce costs [1][5]. - UCM architecture includes several key modules that enhance capabilities such as sparse attention and context window expansion, achieving up to 90% reduction in latency for the first token and a 22-fold increase in system throughput [1]. - Huawei has also developed its own HBM variants, HiBL 1.0 and HiZQ 2.0, which are designed to lower costs compared to high-performance HBM3e/4e, particularly for inference and recommendation scenarios [6].