AI推理
Search documents
华为在沪发布AI推理创新技术UCM 9月将正式开源
Sou Hu Cai Jing· 2025-08-12 11:53
Core Insights - The article discusses the advancements in AI reasoning technology, particularly focusing on Huawei's UCM reasoning memory data manager, which aims to enhance AI inference efficiency and reduce costs [2][3]. Group 1: AI Technology Development - AI reasoning is entering a critical growth phase, with the UCM reasoning memory data manager being a key innovation [2]. - UCM integrates various caching acceleration algorithms and manages KV Cache memory data to improve inference experiences [2][3]. Group 2: Performance Enhancements - UCM technology can reduce the first token latency by up to 90% and expand the inference context window by ten times, addressing long text processing needs [3]. - The TPS (tokens per second) can increase by 2 to 22 times in long sequence scenarios, significantly lowering the cost per token for enterprises [3]. Group 3: Industry Collaboration - Huawei and China UnionPay have successfully validated UCM's technology, achieving a 125-fold increase in inference speed for customer service applications [4]. - Future plans include building "AI + Finance" demonstration applications in collaboration with industry partners to transition from experimental validation to large-scale application [4]. Group 4: Open Source Initiative - Huawei announced an open-source plan for UCM, which will be available in September, aiming to contribute to mainstream inference engine communities [4].
华为:AI推理创新技术UCM将于今年9月正式开源
Xin Lang Ke Ji· 2025-08-12 11:21
Group 1 - The forum on the application and development of financial AI reasoning in 2025 featured speeches from executives of China UnionPay and Huawei, highlighting the importance of AI in the financial sector [2] - Huawei introduced the UCM reasoning memory data manager, aimed at enhancing AI reasoning experiences and improving cost-effectiveness, while accelerating the positive cycle of AI in business [2] - The UCM technology was piloted in typical financial scenarios with China UnionPay, showcasing its application in smart financial AI reasoning acceleration [2] Group 2 - The UCM technology demonstrated significant value in a pilot with China UnionPay, achieving a 125-fold increase in large model reasoning speed, allowing for precise identification of customer issues in just 10 seconds [3] - China UnionPay plans to collaborate with Huawei and other partners to build "AI + Finance" demonstration applications, transitioning technology from laboratory validation to large-scale application [3] - Huawei announced the UCM open-source plan, which will be officially launched in September, aiming to contribute to mainstream reasoning engine communities and promote the development of the AI reasoning ecosystem [3]
华为发布AI推理创新技术
半导体芯闻· 2025-08-12 09:48
如果您希望可以时常见面,欢迎标星收藏哦~ 来源 :内容来自新浪财经 。 8月12日下午消息,在2025金融AI推理应用落地与发展论坛上,华为联合中国银联共同发布AI推 理创新技术UCM(推理记忆数据管理器),实现高吞吐、低时延的推理体验。 点这里加关注,锁定更多原创内容 *免责声明:文章内容系作者个人观点,半导体芯闻转载仅为了传达一种不同的观点,不代表半导体芯闻对该 观点赞同或支持,如果有任何异议,欢迎联系我们。 10万亿,投向半导体 芯片巨头,市值大跌 黄仁勋:HBM是个技术奇迹 Jim Keller:RISC-V一定会胜出 推荐阅读 喜欢我们的内容就点 "在看 " 分享给小伙伴哦~ 在当今数字化时代,AI发展日新月异。大模型训练的热潮尚未消退,AI推理体验却已悄然成为AI 应用的关键。在2025WAIC期间发布的白皮书指出,AI正从训练向推理的结构性转变而快速增长。 在这样的大背景下,AI推理体验的重要性愈发凸显。 推理体验直接关系到用户与AI交互时的感受,包括回答问题的时延、答案的准确度以及复杂上下 文的推理能力等方面。资料显示,国外主流模型的单用户输出速度已进入200 Tokens/s区间(时延 5m ...
华为发布AI推理“黑科技” 助力解决AI推理效率与用户体验难题
Zhong Guo Ji Jin Bao· 2025-08-12 07:50
Core Insights - Huawei officially launched the AI inference "black technology" UCM (Inference Memory Data Manager) to address challenges in AI inference efficiency and user experience [1] Group 1: AI Inference Development - The AI industry is shifting focus from "maximizing model capabilities" to "optimizing inference experience," which directly impacts user satisfaction and commercial viability, becoming a key metric for measuring AI model value [2] - Huawei plans to open-source UCM in September, initially launching it in the Magic Engine community and gradually contributing to mainstream inference engine communities and sharing with all storage vendors and ecosystem partners [2] Group 2: UCM Features and Benefits - UCM is a KV Cache-centered inference acceleration suite that integrates various caching acceleration algorithms, enabling tiered management of KV Cache memory data generated during inference, thus expanding the inference context window for high throughput and low latency [3] - UCM can dynamically offload long sequence Cache to external professional storage using techniques like dynamic KV unloading and position encoding expansion, achieving a tenfold increase in inference context window [7] Group 3: Cost Efficiency and Performance - UCM enhances inference cost efficiency by allowing memory to flow based on usage across HBM, DRAM, and SSD storage mediums, and by integrating various sparse attention algorithms to improve TPS (tokens per second) by 2 to 22 times, thereby reducing the cost per token [8] - Current mainstream AI models in China have a single-user output speed of less than 60 Tokens/s with a latency of 50 to 100 ms, while leading foreign models have entered the 200 Tokens/s range with a latency of 5 ms [8] Group 4: Practical Applications - Huawei's AI inference acceleration solution, combining UCM with Huawei AI storage (OceanStor A series), is being piloted with China UnionPay in three business scenarios: Voice of the Customer, Marketing Planning, and Office Assistant [9] - For the Office Assistant scenario, the solution supports user input of over 170,000 tokens for long sequence inference, addressing the issue of long sequence models being unable to function effectively [10]
AI重磅!华为“黑科技”来了
Zhong Guo Ji Jin Bao· 2025-08-12 07:40
Core Insights - Huawei has officially launched its AI inference technology UCM (Unified Cache Manager), aimed at addressing challenges in AI inference efficiency and user experience [1] - The AI industry is shifting focus from maximizing model capabilities to optimizing inference experiences, which directly impacts user satisfaction and commercial viability [1] Group 1: UCM Technology Overview - UCM is a KV Cache-centered inference acceleration suite that integrates various caching algorithms to manage KV Cache memory data during inference, enhancing throughput and reducing latency [2] - The growth of AI inference demands has led to an increase in KV Cache capacity, which has exceeded GPU memory limits, necessitating innovative solutions like UCM [2][3] - UCM's core value lies in providing faster inference responses and longer inference sequences, addressing the limitations of current AI models [2] Group 2: Performance Improvements - UCM enables dynamic KV unloading and position encoding expansion, achieving a tenfold increase in inference context window [3] - The technology allows for on-demand data flow across different storage media (HBM, DRAM, SSD), improving TPS (tokens per second) by 2 to 22 times, thereby reducing the cost per token [4] - Current mainstream AI models in China output tokens at a significantly lower speed compared to their international counterparts, highlighting the need for UCM's capabilities [4] Group 3: Practical Applications - Huawei's AI inference acceleration solution, in collaboration with China UnionPay, is being piloted in three business scenarios: customer voice, marketing planning, and office assistant [5] - The office assistant application can support user inputs exceeding 170,000 tokens, overcoming challenges associated with long sequence models [5]
AI重磅!华为“黑科技”来了
中国基金报· 2025-08-12 07:37
Core Viewpoint - Huawei has officially launched the AI inference technology "UCM" (Inference Memory Data Manager) to address challenges in AI inference efficiency and user experience [2][4]. Group 1: AI Inference Development - The AI industry is shifting focus from maximizing model capabilities to optimizing inference experiences, which directly impacts user satisfaction and commercial viability [4]. - Huawei plans to open-source UCM in September, initially releasing it on the Magic Engine community and gradually contributing to mainstream inference engine communities [5]. Group 2: UCM Technology and Benefits - UCM is a KV Cache-centered inference acceleration suite that integrates various caching acceleration algorithms to manage KV Cache memory data during inference, enhancing throughput and reducing latency [7]. - UCM enables longer inference sequences by offloading cache to external storage, achieving a tenfold increase in inference context window [8]. Group 3: Cost Efficiency and Performance - UCM can dynamically manage memory across HBM, DRAM, and SSD based on memory usage, improving TPS (tokens per second) by 2 to 22 times, thus lowering the cost per token [11]. - Current mainstream AI models in China output less than 60 tokens per second with a latency of 50 to 100 ms, while leading models abroad reach 200 tokens per second with a latency of 5 ms [11]. Group 4: Practical Applications - Huawei's AI inference acceleration solution, combining UCM with OceanStor A series technology, is being piloted in collaboration with China UnionPay across three business scenarios: Voice of Customer, Marketing Planning, and Office Assistant [12]. - In the Office Assistant scenario, the solution supports user input of over 170,000 tokens for long-sequence inference, addressing the limitations of long-sequence models [15].
华为发布AI推理创新技术UCM:实现高吞吐、低时延推理体验,降低每Token推理成本
Xin Lang Ke Ji· 2025-08-12 07:22
据介绍,华为此次发布的AI推理创新技术UCM(推理记忆数据管理器),作为一款以KV Cache为中心 的推理加速套件,其融合了多类型缓存加速算法工具,分级管理推理过程中产生的KV Cache记忆数 据,扩大推理上下文窗口,以实现高吞吐、低时延的推理体验,降低每Token推理成本。 责任编辑:郭栩彤 新浪科技讯 8月12日下午消息,在2025金融AI推理应用落地与发展论坛上,华为联合中国银联共同发布 AI推理创新技术UCM(推理记忆数据管理器),实现高吞吐、低时延的推理体验。 在当今数字化时代,AI发展日新月异。大模型训练的热潮尚未消退,AI推理体验却已悄然成为AI应用 的关键。中信建投在2025WAIC期间发布的白皮书指出,AI正从训练向推理的结构性转变而快速增长。 在这样的大背景下,AI推理体验的重要性愈发凸显。 推理体验直接关系到用户与AI交互时的感受,包括回答问题的时延、答案的准确度以及复杂上下文的 推理能力等方面。资料显示,国外主流模型的单用户输出速度已进入200 Tokens/s区间(时延5ms),而 我国普遍小于60Tokens/s(时延50 - 100ms),如何解决推理效率与用户体验的难题迫在 ...
张忆东:震荡是港股长期行情的蓄电池!恒生科技ETF基金(513260)、港股通科技ETF汇添富(520980)连续回调“吸金”!
Xin Lang Cai Jing· 2025-08-12 06:57
Market Overview - The Hong Kong stock market experienced a collective decline, with the Hang Seng Tech ETF (513260) dropping by 0.43% despite attracting over 640 million yuan in net inflows over the past 10 days [1] - The financing balance for the Hang Seng Tech ETF has exceeded 130 million yuan, with a recent financing purchase amounting to 39.57 million yuan [1] Sector Performance - The technology sector in Hong Kong showed mixed results, with notable gains from Huahong Semiconductor (up over 4%), SMIC (up over 3%), and BYD Electronics (up over 2%) [4] - Conversely, Kuaishou saw a significant drop of over 8%, while Alibaba and Tencent experienced slight declines [4] Company Insights - Huawei is set to unveil breakthrough technology in AI inference at a forum on August 12, which may reduce reliance on HBM technology and enhance the performance of domestic AI models [5] - The performance of major tech companies is expected to be a catalyst for market movements, with a focus on their mid-year earnings reports [8] Investment Sentiment - Analysts from Xinyi Securities maintain a bullish long-term outlook for Hong Kong stocks, emphasizing the strengthening position of Hong Kong as an international financial center and the positive feedback loop from quality companies listing in Hong Kong [6] - The market is anticipated to experience a phase of consolidation, with a focus on mid-year earnings and value propositions [6][8] Long-term Outlook - The long-term outlook for Hong Kong stocks remains optimistic, driven by improving supply-demand dynamics and the potential for economic recovery from a "passive destocking" phase [8] - The technology sector is viewed as a key driver for economic transformation, with AI playing a significant role in future growth [9]
华为发布AI推理创新技术UCM
Zheng Quan Shi Bao Wang· 2025-08-12 06:55
人民财讯8月12日电,8月12日,华为正式发布AI推理创新技术UCM(推理记忆数据管理器)。据了解,作 为一款以KV Cache为中心的推理加速套件,UCM融合了多类型缓存加速算法工具,分级管理推理过程 中产生的KV Cache记忆数据,可扩大推理上下文窗口,实现高吞吐、低时延的推理体验,降低每Token 推理成本。该技术已率先在中国银联"客户之声""营销策划""办公助手"三大业务场景中,开展智慧金融 AI推理加速应用试点,并已取得成果。 ...
华为即将发布AI推理领域突破性黑科技;供需失衡,第三季DDR4合约价或季增85%-90%——《投资早参》
Mei Ri Jing Ji Xin Wen· 2025-08-12 01:01
Market News - The three major US stock indices experienced slight declines, with the Dow Jones down 0.45%, Nasdaq down 0.3%, and S&P 500 down 0.25%. Major tech stocks mostly fell, including Apple, Microsoft, Nvidia, Google, Amazon, Meta, and AMD, while Intel dropped over 3% and Tesla rose over 2% [1] - The Chinese concept stocks mostly declined, with the Nasdaq China Golden Dragon Index down 0.29%. Notable declines included TAL Education down over 3%, Li Auto down nearly 3%, and Baidu and Alibaba down over 1% [1] - Metal futures generally fell, with COMEX gold futures down 2.80% at $3393.7 per ounce, and COMEX silver futures down 2.33%. International oil prices saw slight increases, with WTI crude up 0.19% at $64.00 per barrel and Brent crude up 0.15% at $66.69 per barrel [1] Industry Insights - Huawei held a forum titled "AI Rise, Opening a New Chapter in Smart Finance," discussing the importance of AI reasoning experience and the launch of AI reasoning acceleration technology, which aims to reduce reliance on HBM technology and enhance AI model performance in China [2] - TrendForce reported that the DDR4 market will face sustained supply shortages and price increases in the second half of 2025, driven by strong server orders affecting the supply for computers and end-users. The price of Consumer DDR4 contracts surged by 60% to 85%, leading to a significant upward revision of third-quarter prices by 85% to 90% [3][4] - The Hangzhou Municipal Justice Bureau is seeking public opinion on a draft regulation to promote the development of embodied intelligent robots, focusing on enhancing computing resource efficiency and reducing costs, with an emphasis on core technologies in the field [5][6] - The market for embodied intelligence is expected to grow significantly, potentially exceeding one trillion yuan by 2026, driven by advancements in humanoid robots and AI models [6] Stock Movements - A number of companies announced share reduction plans, including Aokang International, Tianfu Communication, and Qide New Materials, with various shareholders planning to reduce their stakes through centralized bidding or block trading [7][8] - Chongqing Bank reported that a major shareholder plans to reduce its stake by up to 52 million shares, which would decrease its holding from 8.5% to 7% [8]