Workflow
单芯片最高2400TPS,华为云Tokens服务全面接入384超节点
Guan Cha Zhe Wang·2025-08-27 13:10

Core Viewpoint - Huawei Cloud has announced the full integration of its Tokens service with the CloudMatrix384 super node, achieving a significant performance breakthrough with a maximum throughput of 2400 TPS and a low latency of 50 ms, surpassing industry standards [1][2]. Group 1: AI Computing Demand and Tokens Service - Over the past 18 months, the demand for AI computing power in China has grown exponentially, with daily Token consumption increasing from 100 billion at the beginning of 2024 to over 30 trillion by June 2023, a growth of over 300 times in just 1.5 years [2]. - Huawei Cloud launched its Tokens service based on MaaS in March 2023, offering various service specifications to meet different performance and latency requirements for AI tools [2]. - The integration of Tokens service with CloudMatrix384 has led to an increase in throughput from 1920 TPS at the beginning of the year to 2400 TPS [2]. Group 2: Full-Stack Innovation and Architecture - The construction of large computing power is a full-stack innovation encompassing hardware, software, operators, storage, inference frameworks, and super nodes, leveraging Huawei's comprehensive capabilities [4]. - The CloudMatrix384 super node features a new computing architecture that breaks performance bottlenecks and establishes a robust computing foundation [4]. - The CANN Ascend hardware optimizes operators and communication strategies, enabling efficient utilization of cloud computing power [4]. Group 3: xDeepServe and Performance Enhancement - xDeepServe, as a native service of CloudMatrix384, utilizes a Transformerless architecture to decompose large models into independent micro-modules, allowing for parallel processing across different NPUs [5][6]. - The performance of Tokens service has improved from 600 tokens/s on non-super nodes to 2400 tokens/s on super nodes through continuous optimization of xDeepServe [6]. - FlowServe, a restructured decentralized distributed engine, allows for autonomous DP groups within CloudMatrix384, ensuring high concurrency without congestion [6]. Group 4: Model Performance and Industry Applications - Huawei Cloud's MaaS service supports major large models and has developed capabilities for model performance optimization, achieving twice the output speed of mainstream platforms for image generation [8]. - The company has partnered with over 100 organizations to develop AI Agents across various industry scenarios, enhancing efficiency in fields such as analysis, content creation, and smart operations [8][9]. - The introduction of intelligent solutions, such as the talent digital employee solution, demonstrates the application of advanced technologies to improve service efficiency and customer satisfaction [9].