摩尔线程发布“花港”GPU新架构,万卡AI训练与推理能力,剑指英伟达

Core Insights - The company unveiled its next-generation GPU architecture "Huagang" at the first MUSA Developer Conference (MDC2025) in Beijing, showcasing advancements in AI training clusters and various technologies [1][2] - The new architecture supports full precision computing from FP4 to FP64, with a 50% increase in computing density and a 10x improvement in energy efficiency [1] - The company plans to launch the "Huashan" chip focused on AI training and inference, and the "Lushan" chip aimed at graphics rendering [1] Architecture and Performance - The "Huagang" architecture enhances training cluster capabilities with the "Kua'e" 10,000-card intelligent computing cluster, achieving 60% training utilization on dense models and 40% on mixture of experts models, with a linear scaling efficiency of 95% [1] - In inference, the company collaborated with Silicon-based Flow to achieve a single card Prefill throughput exceeding 4000 tokens/s and Decode throughput over 1000 tokens/s on the DeepSeek R1671B model [1] Software Ecosystem - The MUSA 5.0 version optimizes programming models, computing libraries, and compilers, with core computing library muDNN's GEMM and FlashAttention efficiency exceeding 98% and communication efficiency reaching 97% [1] - The company plans to gradually open-source some core components, including computing acceleration libraries and system management frameworks [1] Graphics and AI Integration - The new architecture integrates hardware ray tracing acceleration engines and supports self-developed AI generative rendering technology [2] - The company introduced the MTLambda simulation training platform and the MTT AIBOOK based on the "Yangtze" SoC, focusing on cutting-edge fields like embodied intelligence and AI for Science [2] Future Infrastructure - The company announced the MTTC256 super-node architecture design for next-generation large-scale intelligent computing centers, emphasizing high-density hardware and energy efficiency optimization [2] - The comprehensive technology layout from chip architecture to cluster infrastructure and edge devices aims to support the development of the domestic AI computing ecosystem [2] - Industry experts believe the company is positioning itself to compete directly with Nvidia by releasing its architecture early to boost confidence in its software ecosystem [2]

Moore Threads Technology-摩尔线程发布“花港”GPU新架构,万卡AI训练与推理能力,剑指英伟达 - Reportify