Workflow
智慧金融AI推理加速方案
icon
Search documents
每Token成本显著降低 华为发布UCM技术破解AI推理难题
Huan Qiu Wang· 2025-08-18 07:40
Core Insights - The forum highlighted the launch of Huawei's UCM inference memory data manager, aimed at enhancing AI inference experiences and cost-effectiveness in the financial sector [1][5] - AI inference is entering a critical growth phase, with inference experience and cost becoming key metrics for model value [3][4] - Huawei's UCM technology has been validated through a pilot project with China UnionPay, demonstrating a 125-fold increase in inference speed [5][6] Group 1: AI Inference Development - AI inference is becoming a crucial area for explosive growth, with a focus on balancing efficiency and cost [3][4] - The transition from "model intelligence" to "data intelligence" is gaining consensus in the industry, emphasizing the importance of high-quality data [3][4] - The UCM data manager consists of three components designed to optimize inference experience and reduce costs [4] Group 2: UCM Technology Features - UCM technology reduces latency for the first token by up to 90% and expands context windows for long text processing by tenfold [4] - The intelligent caching capability of UCM allows for on-demand data flow across various storage media, significantly improving token processing speed [4] - UCM's implementation in financial applications addresses challenges such as long sequence inputs and high computational costs [5] Group 3: Industry Collaboration and Open Source - Huawei announced an open-source plan for UCM, aiming to foster collaboration across the industry and enhance the AI inference ecosystem [6][7] - The open-source initiative is expected to drive standardization and encourage more partners to join in improving inference experiences and costs [7] - The launch of UCM technology is seen as a significant breakthrough for AI inference and a boost for smart finance development [7]
2025金融AI推理应用落地与发展论坛在金融数据港成功举办
Sou Hu Cai Jing· 2025-08-15 17:35
Group 1 - The 2025 Financial AI Inference Application Landing and Development Forum was held at the Financial Data Port AI Innovation Center on August 12, with key figures from China UnionPay and Huawei in attendance [1] - Huawei's Vice President and President of the Data Storage Product Line, Dr. Zhou Yuefeng, introduced the AI inference innovation technology called UCM Inference Memory Data Manager at the forum [3] - China UnionPay plans to leverage the National Artificial Intelligence Application Pilot Base to collaborate with Huawei and other ecosystem partners to build "AI + Finance" demonstration applications, transitioning technology results from "laboratory validation" to "large-scale application" [5] Group 2 - Huawei and China UnionPay jointly released the application results of the Smart Financial AI Inference Acceleration Program during the forum [3][5]
华为AI推理新技术犀利!中国银联大模型效率提高了125倍
Core Insights - Huawei has launched the Unified Cache Manager (UCM), an AI inference innovation technology aimed at optimizing inference speed, efficiency, and cost [1] - UCM is a KV Cache-centered inference acceleration suite that integrates various caching acceleration algorithms to manage KV Cache memory data during inference, enhancing throughput and reducing per-token inference costs [1][5] Summary by Sections UCM Overview - UCM is designed to address pain points in the inference process, focusing on optimizing user experience and commercial viability [4] - The technology aims to improve inference speed, with current foreign models achieving 200 Tokens/s compared to less than 60 Tokens/s in China [4] Technical Components - UCM consists of three main components: inference engine plugins, a library of multi-level KV Cache management and acceleration algorithms, and high-performance KV Cache access adapters [5] - The system can reduce the first token latency by up to 90% through its hierarchical adaptive global prefix caching technology [5] Application in Financial Sector - Huawei has partnered with China UnionPay to pilot UCM technology in financial scenarios, achieving a 125-fold increase in inference speed, allowing for rapid identification of customer inquiries [5] - The financial industry is seen as a natural fit for early adoption due to its digital nature and high demands for speed, efficiency, and reliability [5] Future Developments - China UnionPay plans to collaborate with Huawei and other partners to build "AI + Finance" demonstration applications, transitioning technology from experimental validation to large-scale application [6] Differentiation of UCM - UCM's advantages include the integration of professional storage capabilities, a comprehensive lifecycle management mechanism for KV Cache, and a diverse algorithm acceleration library [8] - Unlike existing solutions that focus solely on prefix caching, UCM incorporates a wider range of algorithms and is adaptable to various inference scenarios [8] Open Source Initiative - Huawei has announced an open-source plan for UCM, aiming to foster collaboration among framework, storage, and computing vendors to enhance efficiency and reduce costs in the AI industry [9] - The open-source initiative will officially launch in September, contributing to the mainstream inference engine community [9]