Workflow
Token经济
icon
Search documents
华为发布AI黑科技UCM,9月正式开源
据了解,目前,国外主流模型的单用户输出速度已进入200 Tokens/s区间(时延5ms),而我国普遍小于 60 Tokens/s(时延50-100ms),如何解决推理效率与用户体验的难题迫在眉睫。 "高延迟、高成本是当下AI推理领域发展的主要挑战。"华为数字金融军团CEO曹冲在会上表示。 华为方面介绍,作为一款以KV Cache为中心的推理加速套件,UCM融合了多类型缓存加速算法工具, 分级管理推理过程中产生的KV Cache记忆数据,可扩大推理上下文窗口,以实现高吞吐、低时延的推 理体验,降低每Token推理成本。 在具体技术实现路径方面,华为相关负责人表示,UCM通过层级化自适应的全局前缀缓存技术,可实 现任意物理位置、任意输入组合上的KV前缀缓存重用,在多轮对话、RAG知识检索等场景中直接调用 KV缓存数据,避免重复计算,使首Token时延最大降低90%。 (原标题:华为发布AI黑科技UCM,9月正式开源) AI时代下,推理技术关系用户与AI交互的体验,包括回答问题的时延、答案的准确度以及复杂上下文 的推理能力等,在此背景下,华为最新推出AI推理黑科技UCM(推理记忆数据管理器),可大幅降低 推理时延 ...
华为发布AI黑科技UCM,下个月开源
Core Insights - Huawei has launched a new AI inference technology called UCM, aimed at significantly reducing inference latency and costs while enhancing efficiency in AI interactions [1][2] Group 1: Technology and Innovation - UCM utilizes a KVCache-centered architecture that integrates various caching acceleration algorithms to manage KVCache memory data, thereby expanding the inference context window and achieving high throughput with low latency [1][2] - The technology features hierarchical adaptive global prefix caching, which allows for the reuse of KV prefix cache across various physical locations and input combinations, reducing the first token latency by up to 90% [2] - UCM can automatically tier cache based on memory heat across different storage media (HBM, DRAM, SSD) and incorporates sparse attention algorithms to enhance processing speed, achieving a 2 to 22 times increase in tokens processed per second (TPS) [2] Group 2: Market Context and Challenges - Currently, Chinese internet companies' investment in AI is only one-tenth of that in the United States, and the inference experience in domestic large models lags behind international standards, which could lead to user attrition and a slowdown in investment [3] - The rise in user scale and request volume in AI applications has led to an exponential increase in token usage, with a projected daily token call of 16.4 trillion by May 2025, representing a 137-fold increase from the previous year [4] - Balancing the high operational costs associated with increased token processing and the need for enhanced computational power is a critical challenge for the industry [4] Group 3: Strategic Initiatives - Huawei has initiated pilot applications of UCM in three business scenarios with China UnionPay, focusing on smart financial AI inference acceleration [3] - The company plans to open-source UCM by September 2025, aiming to foster collaboration within the industry to develop inference frameworks and standards [4]
AI重磅!华为“黑科技”来了
Zhong Guo Ji Jin Bao· 2025-08-12 07:40
【导读】华为发布AI推理"黑科技",助力解决AI推理效率与用户体验难题 8月12日下午,华为正式发布AI推理"黑科技"UCM(推理记忆数据管理器),助力解决AI推理效率与用 户体验的难题。 AI推理是AI产业在下一阶段的发展重心。AI产业已从"追求模型能力极限"转向"追求推理体验最优化", 推理体验直接关联用户满意度、商业可行性等核心需求,成为衡量AI模型价值的黄金标尺。 据悉,华为计划在9月开源UCM。届时,华为将在魔擎社区首发,后续逐步贡献给业界主流推理引擎社 区,并共享给所有Share Everything(共享架构)的存储厂商和生态伙伴。 UCM将提升推理系统效率和性能 UCM是一款以KV Cache(键值缓存)为中心的推理加速套件,融合多类型缓存加速算法工具,可以分 级管理推理过程中产生的KV Cache记忆数据,扩大推理上下文窗口,以实现高吞吐、低时延的推理体 验,从而降低每个Token(词元)的推理成本。 KV Cache是一种用于优化计算效率、减少重复运算的关键技术,但是需要占用GPU(图形处理器)的 显存存储历史KV(键值)向量,生成的文本越长,缓存的数据量越大。 随着信息技术应用创新产业( ...
tokens消耗量高速增长,算力经营成为新业态
Tebon Securities· 2025-08-04 06:56
Investment Strategy - The explosive demand for computing power is driving a significant increase in capital expenditures (capex), with infrastructure construction expected to enter a golden period. Major companies like Microsoft and Meta are reporting substantial profit and capex growth, indicating a high level of activity in the computing power industry. For instance, Microsoft's net profit for Q2 was $27.23 billion, a 24% year-on-year increase, while its capital expenditure rose by 27% to $24.2 billion. Meta's net profit grew by 36% to $18.34 billion, with projected capital expenditures for 2025 reaching between $66 billion and $72 billion [4][10][14]. - The token economy's business model is being validated, with domestic demand for computing power expected to accelerate. Alphabet reported processing over 980 trillion tokens monthly, with ChatGPT's weekly active users surpassing 700 million. The weekly token consumption for large models has seen a nearly fivefold increase from January to July 2025 [11][12]. - The "AI+" policy is catalyzing growth in industry applications. The Chinese government is actively promoting the commercialization of AI, with local policies supporting the expansion of AI applications. For example, Shanghai has issued measures including the distribution of 600 million yuan in computing power vouchers [12][15]. Industry News - Major overseas companies are experiencing significant growth in performance and capital expenditure. Microsoft's Q2 revenue reached $76.44 billion, an 18% year-on-year increase, while Meta's revenue was $47.52 billion, up 22%. Both companies attribute their growth to the deep application of AI technology [14]. - The State Council of China has approved the "AI+" action plan, emphasizing the need for large-scale commercialization of AI applications. This policy aims to leverage China's complete industrial system and large market scale to promote AI integration across various sectors [15]. - The rapid launch of satellites indicates an acceleration in satellite internet construction in China. Recent successful launches of low-orbit satellites demonstrate the country's commitment to enhancing broadband communication services [16][17]. - Eight major banks have jointly released financial service products for the "AI+ manufacturing" sector, with a commitment to provide at least 400 billion yuan in credit by the end of 2027. This initiative aims to support the intelligent transformation of manufacturing enterprises [18]. Weekly Review and Focus - The communication sector saw a 4.12% increase this week, outperforming major indices like the Shanghai Composite Index, which fell by 0.57%. Notable gains were observed in optical modules and optical communication sectors, with increases of 9.86% and 5.29%, respectively [19][21]. - The focus for the upcoming week includes investment opportunities in the AIDC chain and related sectors, with companies such as ZTE Corporation and Inspur Information being highlighted for potential growth [23].
从技术演进到算力消耗估算,深度拆解AIAgent:AI进入Token时代,MCP赋能Agent迈向泛智能
ZHONGTAI SECURITIES· 2025-04-06 12:38
Investment Rating - The report maintains a rating of "Overweight" for the industry [4]. Core Insights - The AI Agent has reached a critical point of explosive growth, with all necessary components now integrated, leading to enhanced user experience and accelerated penetration into various sectors [5][10]. Summary by Sections Industry Overview - The industry comprises 131 listed companies with a total market value of 15,067.40 billion and a circulating market value of 13,714.81 billion [2]. Key Companies and Financials - Notable companies include: - Southern Media: Stock price 16.43, EPS 1.06 for 2022, projected EPS of 1.15 for 2026, rating "Buy" [4]. - Kaiying Network: Stock price 16.45, EPS 0.49 for 2022, projected EPS of 1.00 for 2026, rating "Buy" [4]. - Century Tianhong: Stock price 10.76, EPS 0.17 for 2022, projected EPS of 0.21 for 2026, rating "Overweight" [4]. Technological Evolution - The development of AI Agents is likened to building blocks, where previously isolated technologies are now integrated, enabling AI Agents to operate autonomously [5][10]. - Key advancements include: - Enhanced coding capabilities of large models, allowing for industry-level applications [5]. - The introduction of standardized tool invocation protocols like MCP, which simplifies the integration of various tools and data sources [31][32]. Market Dynamics - The report anticipates a surge in the availability of tools and software interfaces for large models, driven by the decreasing costs of token usage [5][20]. - The MCP platform has launched over 3,500 servers across multiple fields, indicating a robust ecosystem for AI Agents [5]. Computational Demand - A global AI Agent application with 1 billion daily active users is estimated to require approximately 141,500 NVIDIA H100 SXM GPUs for daily operations [66]. - The report provides a detailed sensitivity analysis on token consumption and computational needs based on user interaction patterns [54][60]. Investment Recommendations - The report suggests focusing on companies across various segments of the AI ecosystem, including hardware (NVIDIA, AMD), model development (Alphabet, Microsoft), and applications (Tesla, Salesforce) [9].