Workflow
大模型训推
icon
Search documents
华为云:CloudMatrix384突破大模型训推瓶颈,加速行业智能化跃迁
Sou Hu Cai Jing· 2025-06-24 11:58
Core Insights - The Huawei Developer Conference 2025 featured a summit focused on the "CloudMatrix384 Ascend AI Cloud Service," highlighting its role in accelerating AI innovation across industries through overcoming computational, operational, and storage bottlenecks [1][8]. Group 1: AI Infrastructure Standards - The rapid evolution of AI large models presents challenges in computational, operational, and storage capabilities, which are referred to as the "computational wall," "communication wall," and "storage wall" [2]. - The CloudMatrix384 Ascend AI Cloud Service is positioned as a new standard for AI infrastructure, addressing these challenges effectively [2][6]. Group 2: Technical Features of CloudMatrix384 - The service integrates "hardware reconstruction + software intelligence" to create a high-density, high-speed, and efficient AI-native infrastructure [6]. - High-density capabilities are achieved by connecting 384 Ascend NPUs with 192 Kunpeng CPUs through the MatrixLink high-speed network, forming a "super AI server" that supports up to 160,000 nodes [6]. - High-speed communication is facilitated by the MatrixLink architecture, achieving a bandwidth of 2.8 Tb/s and reducing communication latency to nanoseconds [6]. - Efficiency is enhanced through intelligent scheduling, increasing the effective utilization of computational resources by over 50% [7]. Group 3: Industry Applications and Collaborations - The CloudMatrix384 service has been validated across various industries, with companies like Silicon Flow demonstrating significant performance improvements in AI model training and inference [12][15]. - Other companies, including Sina and iFlytek, have reported enhanced efficiency and performance in their AI applications using the CloudMatrix384 service [22]. - The service is expected to integrate deeply into sectors such as e-commerce, social media, entertainment, finance, and automotive, thereby lowering the barriers to AI innovation [22]. Group 4: Future Outlook - The summit served as a platform for showcasing technological achievements and fostering collaboration among industry players, marking the entry of AI infrastructure into the "super node era" [22]. - Huawei Cloud aims to partner with clients and stakeholders to drive industry-wide intelligent transformation [22].
华为「数字化风洞」小时级预演万卡集群方案,昇腾助力大模型运行「又快又稳」
雷峰网· 2025-06-11 11:00
Core Viewpoint - The article discusses the launch of the Ascend modeling and simulation platform, which aims to optimize the interaction between load, optimization strategies, and system architecture to enhance infrastructure performance [1]. Group 1: Challenges in AI Model Training - Over 60% of computing power is wasted due to hardware resource mismatches and system coupling, highlighting the inefficiencies in traditional optimization methods [2]. - The training process for large models is likened to "slamming the gas pedal," where the MoE model requires precise balancing of computation and memory to avoid efficiency drops [4]. - Dynamic real-time inference systems face challenges in meeting both high throughput and low latency requirements across varying task types [4]. Group 2: Solutions and Innovations - The "digital wind tunnel" allows for pre-simulation of complex AI models in a virtual environment, enabling the identification of bottlenecks and optimization strategies before real-world implementation [6]. - The Sim2Train framework enhances the efficiency of large-scale training clusters through automatic optimization of deployment space and dynamic performance awareness, achieving a 41% improvement in resource utilization [7]. - The Sim2Infer framework focuses on real-time optimization of inference systems, resulting in over 30% performance improvement through adaptive mixed-precision inference and global load balancing [8]. Group 3: High Availability and Reliability - The Sim2Availability framework ensures high availability of the Ascend computing system, achieving a 98% uptime and rapid recovery from failures through advanced optimization techniques [11]. - The system employs a comprehensive monitoring approach to track hardware states and optimize software fault management, enhancing overall system reliability [13]. Group 4: Future Outlook - As new applications evolve, the demand for innovative system architectures will increase, necessitating continuous advancements in modeling and simulation methods to support the development of computing infrastructure [16].
从 DeepSeek 部署看,华为如何让 MOE 架构“迎来”海量“专家”?
AI前线· 2025-05-22 04:30
Core Viewpoint - The development of models has shifted from early algorithm optimization to deep innovation at the system engineering level, transitioning from a digital era of bit traffic to a Token economy, with daily Token consumption in China rising from hundreds of billions to tens of trillions [1] Group 1: Model Optimization - Huawei has made significant optimizations for DeepSeek, focusing on three main areas to enhance compatibility and support for enterprise applications [3] - The pre-training aspect includes the implementation of DualPipe technology, which has been improved to minimize static memory usage through the introduction of the DualPipe-V solution [6] - At the operator level, Huawei has enhanced execution efficiency with the MRN PO fusion operator and optimized low-latency communication [7] Group 2: System Architecture - Huawei has developed a new architecture for inference called the "super node" architecture, which interconnects multiple GPUs to reduce communication latency and improve training throughput [14] - The Atlas 900 A3 SuperCluster has been designed to enhance cluster computing efficiency and reliability, achieving a training efficiency increase of 2.7 times [15] - The OmniPlacement algorithm has been introduced to optimize resource utilization by dynamically adapting to expert activation data, improving throughput by 10% [19] Group 3: Load Balancing and Efficiency - Huawei has implemented a large-scale expert parallel (large EP) strategy to enhance inference efficiency, achieving a nearly 20-fold increase in the past two months [17] - The company has developed dynamic priority adjustment and communication optimization strategies to address load balancing challenges in expert parallelism [20]