国泰海通｜电子：昇腾推理加速套件正式开源，昇腾芯片渗透加速

Core Insights - Huawei's Ascend multi-modal inference acceleration suite, MindIE SD, has officially been open-sourced, which is expected to enhance inference efficiency and increase the penetration rate of Ascend chips [1][2] - A joint AI inference acceleration solution was launched by Huawei Data Storage and Zhongke Hongyun, which is anticipated to further boost the market presence of Ascend chips [3] Group 1: Ascend Multi-modal Inference Acceleration Suite - The MindIE SD project includes four key acceleration features: 1. Acceleration plugins that reduce computation and memory access overhead 2. Cache algorithms that significantly improve model runtime performance 3. Multi-card parallelism that allows developers to enable parallel processing with simple interface replacements 4. Quantization and sparse attention algorithms tailored for Ascend hardware to enhance inference efficiency while minimizing resource consumption [2] Group 2: AI Inference Acceleration Joint Solution - The joint solution supports heterogeneous management, enabling collaborative interaction between platforms, computing, and storage, and is compatible with mainstream frameworks like MindSpore, vLLM, and SGLang - It features fine-grained scheduling of computing resources for maximum utilization and leverages Huawei's UCM technology to persist KV Cache in Huawei OceanStor A series storage, reducing redundant computations - In practical tests, the solution achieved a 57.5% reduction in first token latency in typical inference scenarios, with significant improvements in long document inference, where an 86% increase in concurrency and a 36% boost in throughput were observed when sequence length reached 39K [3] Group 3: Chip Validation and Orders - The Ascend 950PR chip has passed validation, and cloud vendors are increasing their orders for Ascend chips, indicating growing demand in the market [4]