Core Insights - Huawei has launched the UCM inference memory data manager to enhance AI inference experiences, improve cost-effectiveness, and accelerate the commercial cycle of AI [2] - The UCM technology has been piloted in financial scenarios in collaboration with China UnionPay, showcasing its application in smart finance [2] Industry Trends - The focus in the large model industry is shifting from training to inference, with current inference computing power demand exceeding training by 58.5% [2] - The release of new models often leads to instability in service providers due to high user demand, necessitating optimizations to reduce inference costs without compromising user experience [3] Performance Comparison - Foreign mainstream large models achieve output speeds in the range of 200 tokens/s with a latency of 5ms, while Chinese models generally fall below 60 tokens/s with latencies of 50-100ms, indicating a maximum disparity of 10 times [4] - Chinese models also support fewer tokens in context windows compared to their foreign counterparts, with a significant probability of missing key information during long text analysis [4] Technical Innovations - The UCM system consists of three main components: a connector for popular inference frameworks, an accelerator for multi-level KV cache management, and an adapter for high-performance KV cache access [6] - By caching previously processed results and data in a high-performance external shared storage, UCM can reduce the first token delay by 90% and significantly speed up inference processes [8][9] Financial Sector Applications - The financial industry is rapidly adopting large models, with a focus on reducing high costs and latency associated with AI inference, which is critical for risk control and transaction security [10] - A collaboration between China UnionPay and Huawei has led to a significant reduction in inference time for label classification from 600 seconds to under 10 seconds, achieving over a 50-fold improvement in efficiency [11] Future Developments - Huawei plans to open-source the UCM technology in September, aiming to create a unified interface that can adapt to various inference engine frameworks, computing power, and storage systems [11]
即将开源!华为发布AI推理黑科技,已在中国银联落地
Tai Mei Ti A P P·2025-08-13 03:44