Workflow
AI推理
icon
Search documents
降低传统路径依赖,华为推出AI推理新技术
Di Yi Cai Jing· 2025-08-12 12:43
Core Insights - Huawei introduced a new AI inference technology called UCM (Unified Cache Manager) aimed at optimizing the efficiency of token flow across various business processes, thereby reducing the inference cost per token [1][2] - There is a significant gap in inference efficiency between leading Chinese internet companies and their overseas counterparts, with foreign models achieving user output speeds of 200 Tokens/s compared to less than 60 Tokens/s for domestic models [1] - The industry currently lacks a universally applicable framework and acceleration mechanism for AI inference, prompting Huawei to seek collaboration with industry players to enhance the maturity of these frameworks [3] Group 1 - UCM focuses on KV Cache and memory management to accelerate inference processes, optimizing the flow of tokens [1] - Huawei's testing indicates that UCM can reduce the first token latency by up to 90% and increase system throughput by a factor of 22, while also achieving a tenfold expansion of context windows [2] - The development of a multi-level, flexible resource system is essential to address the limitations of high bandwidth memory (HBM) in AI inference processes [2] Group 2 - Huawei plans to open-source UCM in September to foster collaboration among framework, storage, and GPU manufacturers [3] - The optimization of system-level inference architecture requires a comprehensive approach that includes chip-level, software-level, and framework-level considerations [3] - The current state of domestic software solutions for AI inference, particularly those based on KV Cache, is not yet mature or widely applicable compared to established foreign solutions [2]
华为在沪发布AI推理创新技术UCM 9月将正式开源
Sou Hu Cai Jing· 2025-08-12 11:53
Core Insights - The article discusses the advancements in AI reasoning technology, particularly focusing on Huawei's UCM reasoning memory data manager, which aims to enhance AI inference efficiency and reduce costs [2][3]. Group 1: AI Technology Development - AI reasoning is entering a critical growth phase, with the UCM reasoning memory data manager being a key innovation [2]. - UCM integrates various caching acceleration algorithms and manages KV Cache memory data to improve inference experiences [2][3]. Group 2: Performance Enhancements - UCM technology can reduce the first token latency by up to 90% and expand the inference context window by ten times, addressing long text processing needs [3]. - The TPS (tokens per second) can increase by 2 to 22 times in long sequence scenarios, significantly lowering the cost per token for enterprises [3]. Group 3: Industry Collaboration - Huawei and China UnionPay have successfully validated UCM's technology, achieving a 125-fold increase in inference speed for customer service applications [4]. - Future plans include building "AI + Finance" demonstration applications in collaboration with industry partners to transition from experimental validation to large-scale application [4]. Group 4: Open Source Initiative - Huawei announced an open-source plan for UCM, which will be available in September, aiming to contribute to mainstream inference engine communities [4].
华为:AI推理创新技术UCM将于今年9月正式开源
Xin Lang Ke Ji· 2025-08-12 11:21
Group 1 - The forum on the application and development of financial AI reasoning in 2025 featured speeches from executives of China UnionPay and Huawei, highlighting the importance of AI in the financial sector [2] - Huawei introduced the UCM reasoning memory data manager, aimed at enhancing AI reasoning experiences and improving cost-effectiveness, while accelerating the positive cycle of AI in business [2] - The UCM technology was piloted in typical financial scenarios with China UnionPay, showcasing its application in smart financial AI reasoning acceleration [2] Group 2 - The UCM technology demonstrated significant value in a pilot with China UnionPay, achieving a 125-fold increase in large model reasoning speed, allowing for precise identification of customer issues in just 10 seconds [3] - China UnionPay plans to collaborate with Huawei and other partners to build "AI + Finance" demonstration applications, transitioning technology from laboratory validation to large-scale application [3] - Huawei announced the UCM open-source plan, which will be officially launched in September, aiming to contribute to mainstream reasoning engine communities and promote the development of the AI reasoning ecosystem [3]
华为发布AI推理创新技术
半导体芯闻· 2025-08-12 09:48
如果您希望可以时常见面,欢迎标星收藏哦~ 来源 :内容来自新浪财经 。 8月12日下午消息,在2025金融AI推理应用落地与发展论坛上,华为联合中国银联共同发布AI推 理创新技术UCM(推理记忆数据管理器),实现高吞吐、低时延的推理体验。 点这里加关注,锁定更多原创内容 *免责声明:文章内容系作者个人观点,半导体芯闻转载仅为了传达一种不同的观点,不代表半导体芯闻对该 观点赞同或支持,如果有任何异议,欢迎联系我们。 10万亿,投向半导体 芯片巨头,市值大跌 黄仁勋:HBM是个技术奇迹 Jim Keller:RISC-V一定会胜出 推荐阅读 喜欢我们的内容就点 "在看 " 分享给小伙伴哦~ 在当今数字化时代,AI发展日新月异。大模型训练的热潮尚未消退,AI推理体验却已悄然成为AI 应用的关键。在2025WAIC期间发布的白皮书指出,AI正从训练向推理的结构性转变而快速增长。 在这样的大背景下,AI推理体验的重要性愈发凸显。 推理体验直接关系到用户与AI交互时的感受,包括回答问题的时延、答案的准确度以及复杂上下 文的推理能力等方面。资料显示,国外主流模型的单用户输出速度已进入200 Tokens/s区间(时延 5m ...
华为发布AI推理“黑科技” 助力解决AI推理效率与用户体验难题
Zhong Guo Ji Jin Bao· 2025-08-12 07:50
8月12日下午,华为正式发布AI推理"黑科技"UCM(推理记忆数据管理器),助力解决AI推理效率与用 户体验的难题。 AI推理是AI产业在下一阶段的发展重心。AI产业已从"追求模型能力极限"转向"追求推理体验最优化", 推理体验直接关联用户满意度、商业可行性等核心需求,成为衡量AI模型价值的黄金标尺。 KV Cache是一种用于优化计算效率、减少重复运算的关键技术,但是需要占用GPU(图形处理器)的 显存存储历史KV(键值)向量,生成的文本越长,缓存的数据量越大。 来源:中国基金报记者拍摄 来源:中国基金报记者拍摄 随着AI产业的发展迈入代理式人工智能时代,模型规模化扩张、长序列需求激增,以及推理任务并发 量增长,导致AI推理的KV Cache容量增长,超出了显存的承载能力。 目前,国外领先芯片厂商通过从硬件迭代到软件优化,再到生态绑定,构建起AI推理时代的"铁三角", 短期内难以被代替。中国企业在单点硬件技术上有所突破,但国产软件及生态适配仍有较大差距。 随着信息技术应用创新产业的国产化改造提速,各行业逐步意识到需要加速构建国产推理生态。UCM 的核心价值在于提供更快的推理响应、更长的推理序列等。 以提供更 ...
AI重磅!华为“黑科技”来了
Zhong Guo Ji Jin Bao· 2025-08-12 07:40
【导读】华为发布AI推理"黑科技",助力解决AI推理效率与用户体验难题 8月12日下午,华为正式发布AI推理"黑科技"UCM(推理记忆数据管理器),助力解决AI推理效率与用 户体验的难题。 AI推理是AI产业在下一阶段的发展重心。AI产业已从"追求模型能力极限"转向"追求推理体验最优化", 推理体验直接关联用户满意度、商业可行性等核心需求,成为衡量AI模型价值的黄金标尺。 据悉,华为计划在9月开源UCM。届时,华为将在魔擎社区首发,后续逐步贡献给业界主流推理引擎社 区,并共享给所有Share Everything(共享架构)的存储厂商和生态伙伴。 UCM将提升推理系统效率和性能 UCM是一款以KV Cache(键值缓存)为中心的推理加速套件,融合多类型缓存加速算法工具,可以分 级管理推理过程中产生的KV Cache记忆数据,扩大推理上下文窗口,以实现高吞吐、低时延的推理体 验,从而降低每个Token(词元)的推理成本。 KV Cache是一种用于优化计算效率、减少重复运算的关键技术,但是需要占用GPU(图形处理器)的 显存存储历史KV(键值)向量,生成的文本越长,缓存的数据量越大。 随着信息技术应用创新产业( ...
AI重磅!华为“黑科技”来了
中国基金报· 2025-08-12 07:37
【导读】华为发布AI推理"黑科技",助力解决AI推理效率与用户体验难题 中国基金报记者 邱德坤 8月12日下午,华为正式发布AI推理"黑科技"UCM(推理记忆数据管理器),助力解决AI推 理效率与用户体验的难题。 来源:中国基金报记者拍摄 AI推理是AI产业在下一阶段的发展重心。AI产业已从"追求模型能力极限"转向"追求推理体验 最优化",推理体验直接关联用户满意度、商业可行性等核心需求,成为衡量AI模型价值的黄 随着AI产业的发展迈入代理式人工智能时代,模型规模化扩张、长序列需求激增,以及推理任 务并发量增长,导致AI推理的KV Cache容量增长,超出了显存的承载能力。 目前,国外领先芯片厂商通过从硬件迭代到软件优化,再到生态绑定,构建起AI推理时代 的"铁三角",短期内难以被代替。中国企业在单点硬件技术上有所突破,但国产软件及生态 适配仍有较大差距。 随着信息技术应用创新产业的国产化改造提速,各行业逐步意识到需要加速构建国产推理生 态。UCM的核心价值在于提供更快的推理响应、更长的推理序列等。 以提供更长的推理序列为例,UCM通过动态KV逐层卸载、位置编码扩展等组合技术,将超长 序列的Cache(缓存) ...
华为发布AI推理创新技术UCM:实现高吞吐、低时延推理体验,降低每Token推理成本
Xin Lang Ke Ji· 2025-08-12 07:22
据介绍,华为此次发布的AI推理创新技术UCM(推理记忆数据管理器),作为一款以KV Cache为中心 的推理加速套件,其融合了多类型缓存加速算法工具,分级管理推理过程中产生的KV Cache记忆数 据,扩大推理上下文窗口,以实现高吞吐、低时延的推理体验,降低每Token推理成本。 责任编辑:郭栩彤 新浪科技讯 8月12日下午消息,在2025金融AI推理应用落地与发展论坛上,华为联合中国银联共同发布 AI推理创新技术UCM(推理记忆数据管理器),实现高吞吐、低时延的推理体验。 在当今数字化时代,AI发展日新月异。大模型训练的热潮尚未消退,AI推理体验却已悄然成为AI应用 的关键。中信建投在2025WAIC期间发布的白皮书指出,AI正从训练向推理的结构性转变而快速增长。 在这样的大背景下,AI推理体验的重要性愈发凸显。 推理体验直接关系到用户与AI交互时的感受,包括回答问题的时延、答案的准确度以及复杂上下文的 推理能力等方面。资料显示,国外主流模型的单用户输出速度已进入200 Tokens/s区间(时延5ms),而 我国普遍小于60Tokens/s(时延50 - 100ms),如何解决推理效率与用户体验的难题迫在 ...
张忆东:震荡是港股长期行情的蓄电池!恒生科技ETF基金(513260)、港股通科技ETF汇添富(520980)连续回调“吸金”!
Xin Lang Cai Jing· 2025-08-12 06:57
Market Overview - The Hong Kong stock market experienced a collective decline, with the Hang Seng Tech ETF (513260) dropping by 0.43% despite attracting over 640 million yuan in net inflows over the past 10 days [1] - The financing balance for the Hang Seng Tech ETF has exceeded 130 million yuan, with a recent financing purchase amounting to 39.57 million yuan [1] Sector Performance - The technology sector in Hong Kong showed mixed results, with notable gains from Huahong Semiconductor (up over 4%), SMIC (up over 3%), and BYD Electronics (up over 2%) [4] - Conversely, Kuaishou saw a significant drop of over 8%, while Alibaba and Tencent experienced slight declines [4] Company Insights - Huawei is set to unveil breakthrough technology in AI inference at a forum on August 12, which may reduce reliance on HBM technology and enhance the performance of domestic AI models [5] - The performance of major tech companies is expected to be a catalyst for market movements, with a focus on their mid-year earnings reports [8] Investment Sentiment - Analysts from Xinyi Securities maintain a bullish long-term outlook for Hong Kong stocks, emphasizing the strengthening position of Hong Kong as an international financial center and the positive feedback loop from quality companies listing in Hong Kong [6] - The market is anticipated to experience a phase of consolidation, with a focus on mid-year earnings and value propositions [6][8] Long-term Outlook - The long-term outlook for Hong Kong stocks remains optimistic, driven by improving supply-demand dynamics and the potential for economic recovery from a "passive destocking" phase [8] - The technology sector is viewed as a key driver for economic transformation, with AI playing a significant role in future growth [9]
华为发布AI推理创新技术UCM
人民财讯8月12日电,8月12日,华为正式发布AI推理创新技术UCM(推理记忆数据管理器)。据了解,作 为一款以KV Cache为中心的推理加速套件,UCM融合了多类型缓存加速算法工具,分级管理推理过程 中产生的KV Cache记忆数据,可扩大推理上下文窗口,实现高吞吐、低时延的推理体验,降低每Token 推理成本。该技术已率先在中国银联"客户之声""营销策划""办公助手"三大业务场景中,开展智慧金融 AI推理加速应用试点,并已取得成果。 ...