国产推理生态
Search documents
华为发布AI推理“黑科技” 助力解决AI推理效率与用户体验难题
Zhong Guo Ji Jin Bao· 2025-08-12 07:50
8月12日下午,华为正式发布AI推理"黑科技"UCM(推理记忆数据管理器),助力解决AI推理效率与用 户体验的难题。 AI推理是AI产业在下一阶段的发展重心。AI产业已从"追求模型能力极限"转向"追求推理体验最优化", 推理体验直接关联用户满意度、商业可行性等核心需求,成为衡量AI模型价值的黄金标尺。 KV Cache是一种用于优化计算效率、减少重复运算的关键技术,但是需要占用GPU(图形处理器)的 显存存储历史KV(键值)向量,生成的文本越长,缓存的数据量越大。 来源:中国基金报记者拍摄 来源:中国基金报记者拍摄 随着AI产业的发展迈入代理式人工智能时代,模型规模化扩张、长序列需求激增,以及推理任务并发 量增长,导致AI推理的KV Cache容量增长,超出了显存的承载能力。 目前,国外领先芯片厂商通过从硬件迭代到软件优化,再到生态绑定,构建起AI推理时代的"铁三角", 短期内难以被代替。中国企业在单点硬件技术上有所突破,但国产软件及生态适配仍有较大差距。 随着信息技术应用创新产业的国产化改造提速,各行业逐步意识到需要加速构建国产推理生态。UCM 的核心价值在于提供更快的推理响应、更长的推理序列等。 以提供更 ...
AI重磅!华为“黑科技”来了
中国基金报· 2025-08-12 07:37
Core Viewpoint - Huawei has officially launched the AI inference technology "UCM" (Inference Memory Data Manager) to address challenges in AI inference efficiency and user experience [2][4]. Group 1: AI Inference Development - The AI industry is shifting focus from maximizing model capabilities to optimizing inference experiences, which directly impacts user satisfaction and commercial viability [4]. - Huawei plans to open-source UCM in September, initially releasing it on the Magic Engine community and gradually contributing to mainstream inference engine communities [5]. Group 2: UCM Technology and Benefits - UCM is a KV Cache-centered inference acceleration suite that integrates various caching acceleration algorithms to manage KV Cache memory data during inference, enhancing throughput and reducing latency [7]. - UCM enables longer inference sequences by offloading cache to external storage, achieving a tenfold increase in inference context window [8]. Group 3: Cost Efficiency and Performance - UCM can dynamically manage memory across HBM, DRAM, and SSD based on memory usage, improving TPS (tokens per second) by 2 to 22 times, thus lowering the cost per token [11]. - Current mainstream AI models in China output less than 60 tokens per second with a latency of 50 to 100 ms, while leading models abroad reach 200 tokens per second with a latency of 5 ms [11]. Group 4: Practical Applications - Huawei's AI inference acceleration solution, combining UCM with OceanStor A series technology, is being piloted in collaboration with China UnionPay across three business scenarios: Voice of Customer, Marketing Planning, and Office Assistant [12]. - In the Office Assistant scenario, the solution supports user input of over 170,000 tokens for long-sequence inference, addressing the limitations of long-sequence models [15].