Core Insights - Huawei launched the UCM inference memory data manager at the Financial AI Inference Application Forum, aiming to enhance AI inference experience and cost-effectiveness while accelerating the positive cycle of AI in business [1][2] - The UCM technology is being piloted in collaboration with China UnionPay in typical financial scenarios, showcasing its application in smart finance [1] Technology Overview - The UCM inference memory data manager consists of three main components: a connector for different engines and computing power, a library for multi-level KV Cache management and acceleration algorithms, and a high-performance KV Cache access adapter [1] - The technology enables a 90% reduction in latency for the first token by directly accessing KV cache data, avoiding redundant calculations [2] - UCM allows for a tenfold expansion of the inference context window, addressing long text processing needs by offloading ultra-long sequence cache to external professional storage [2] Performance Improvements - UCM's intelligent hierarchical caching capability allows for on-demand flow among HBM, DRAM, and SSD storage media based on memory heat [2] - The integration of various sparse attention algorithms enhances the collaboration between computation and storage, resulting in a 2 to 22 times increase in TPS (tokens processed per second) in long sequence scenarios, significantly lowering the inference cost per token [2] - In a pilot with China UnionPay, the UCM technology improved large model inference speed by 125 times, enabling precise identification of customer inquiries in just 10 seconds [2] Future Developments - UCM is set to be open-sourced in September 2023, with a unified interface to adapt to various inference engine frameworks, computing power, and storage systems [2] - The company aims to contribute UCM to mainstream inference engine communities, fostering the development of the AI inference ecosystem across the industry [2]
华为发布AI推理创新技术--UCM推理记忆数据管理器
Zhong Guo Chan Ye Jing Ji Xin Xi Wang·2025-08-28 00:35