Workflow
UCM推理记忆数据管理器
icon
Search documents
2025金融AI推理应用落地与发展论坛在金融数据港成功举办
Sou Hu Cai Jing· 2025-08-15 17:35
Group 1 - The 2025 Financial AI Inference Application Landing and Development Forum was held at the Financial Data Port AI Innovation Center on August 12, with key figures from China UnionPay and Huawei in attendance [1] - Huawei's Vice President and President of the Data Storage Product Line, Dr. Zhou Yuefeng, introduced the AI inference innovation technology called UCM Inference Memory Data Manager at the forum [3] - China UnionPay plans to leverage the National Artificial Intelligence Application Pilot Base to collaborate with Huawei and other ecosystem partners to build "AI + Finance" demonstration applications, transitioning technology results from "laboratory validation" to "large-scale application" [5] Group 2 - Huawei and China UnionPay jointly released the application results of the Smart Financial AI Inference Acceleration Program during the forum [3][5]
即将开源!华为发布AI推理黑科技,已在中国银联落地
Tai Mei Ti A P P· 2025-08-13 03:44
Core Insights - Huawei has launched the UCM inference memory data manager to enhance AI inference experiences, improve cost-effectiveness, and accelerate the commercial cycle of AI [2] - The UCM technology has been piloted in financial scenarios in collaboration with China UnionPay, showcasing its application in smart finance [2] Industry Trends - The focus in the large model industry is shifting from training to inference, with current inference computing power demand exceeding training by 58.5% [2] - The release of new models often leads to instability in service providers due to high user demand, necessitating optimizations to reduce inference costs without compromising user experience [3] Performance Comparison - Foreign mainstream large models achieve output speeds in the range of 200 tokens/s with a latency of 5ms, while Chinese models generally fall below 60 tokens/s with latencies of 50-100ms, indicating a maximum disparity of 10 times [4] - Chinese models also support fewer tokens in context windows compared to their foreign counterparts, with a significant probability of missing key information during long text analysis [4] Technical Innovations - The UCM system consists of three main components: a connector for popular inference frameworks, an accelerator for multi-level KV cache management, and an adapter for high-performance KV cache access [6] - By caching previously processed results and data in a high-performance external shared storage, UCM can reduce the first token delay by 90% and significantly speed up inference processes [8][9] Financial Sector Applications - The financial industry is rapidly adopting large models, with a focus on reducing high costs and latency associated with AI inference, which is critical for risk control and transaction security [10] - A collaboration between China UnionPay and Huawei has led to a significant reduction in inference time for label classification from 600 seconds to under 10 seconds, achieving over a 50-fold improvement in efficiency [11] Future Developments - Huawei plans to open-source the UCM technology in September, aiming to create a unified interface that can adapt to various inference engine frameworks, computing power, and storage systems [11]