Workflow
智慧金融AI推理加速方案
icon
Search documents
每Token成本显著降低 华为发布UCM技术破解AI推理难题
Huan Qiu Wang· 2025-08-18 07:40
Core Insights - The forum highlighted the launch of Huawei's UCM inference memory data manager, aimed at enhancing AI inference experiences and cost-effectiveness in the financial sector [1][5] - AI inference is entering a critical growth phase, with inference experience and cost becoming key metrics for model value [3][4] - Huawei's UCM technology has been validated through a pilot project with China UnionPay, demonstrating a 125-fold increase in inference speed [5][6] Group 1: AI Inference Development - AI inference is becoming a crucial area for explosive growth, with a focus on balancing efficiency and cost [3][4] - The transition from "model intelligence" to "data intelligence" is gaining consensus in the industry, emphasizing the importance of high-quality data [3][4] - The UCM data manager consists of three components designed to optimize inference experience and reduce costs [4] Group 2: UCM Technology Features - UCM technology reduces latency for the first token by up to 90% and expands context windows for long text processing by tenfold [4] - The intelligent caching capability of UCM allows for on-demand data flow across various storage media, significantly improving token processing speed [4] - UCM's implementation in financial applications addresses challenges such as long sequence inputs and high computational costs [5] Group 3: Industry Collaboration and Open Source - Huawei announced an open-source plan for UCM, aiming to foster collaboration across the industry and enhance the AI inference ecosystem [6][7] - The open-source initiative is expected to drive standardization and encourage more partners to join in improving inference experiences and costs [7] - The launch of UCM technology is seen as a significant breakthrough for AI inference and a boost for smart finance development [7]
2025金融AI推理应用落地与发展论坛在金融数据港成功举办
Sou Hu Cai Jing· 2025-08-15 17:35
Group 1 - The 2025 Financial AI Inference Application Landing and Development Forum was held at the Financial Data Port AI Innovation Center on August 12, with key figures from China UnionPay and Huawei in attendance [1] - Huawei's Vice President and President of the Data Storage Product Line, Dr. Zhou Yuefeng, introduced the AI inference innovation technology called UCM Inference Memory Data Manager at the forum [3] - China UnionPay plans to leverage the National Artificial Intelligence Application Pilot Base to collaborate with Huawei and other ecosystem partners to build "AI + Finance" demonstration applications, transitioning technology results from "laboratory validation" to "large-scale application" [5] Group 2 - Huawei and China UnionPay jointly released the application results of the Smart Financial AI Inference Acceleration Program during the forum [3][5]
华为AI推理新技术犀利!中国银联大模型效率提高了125倍
8月12日,华为发布了AI推理创新技术UCM(推理记忆数据管理器,Unified Cache Manager)。 那么为什么要推出UCM?因为推理过程中仍存在不少痛点。 简单来说,这是专门面向大模型推理过程的"缓存管理技术",目的是为了优化推理速度、效率和成本。 具体来看,UCM是一款以KV Cache为中心的推理加速套件,其融合了多类型缓存加速算法工具,分级 管理推理过程中产生的KV Cache记忆数据,扩大推理上下文窗口,以实现高吞吐、低时延的推理体 验,降低每Token推理成本。 现场,华为公司副总裁、数据存储产品线总裁周跃峰表示,UCM推理记忆数据管理器旨在推动AI推理 体验升级,提升推理性价比,加速AI商业正循环。同时,华为联手中国银联率先在金融典型场景开展 UCM技术试点应用,并联合发布智慧金融AI推理加速方案应用成果。 UCM是什么 对于上述颇多术语的介绍,我们来拆解一下。 首先,什么是KV Cache? 据了解,KV Cache是一种用于优化Transformer等模型推理速度的技术,它的核心思想就是把历史 token 的Key和Value(矩阵)缓存下来,下次生成时直接用,避免重新算,从而提 ...