国产算力迈入“万卡”时代:摩尔线程发布新一代GPU架构,中科曙光发布万卡超集群
Jing Ji Guan Cha Wang·2025-12-20 06:47

Core Insights - The article discusses the advancements in the domestic GPU industry, highlighting the launch of the "Huagang" architecture by Moore Threads and the "scaleX" supercluster system by Inspur, indicating a shift in focus from individual GPU performance to building scalable systems capable of handling massive computational tasks [2][6]. Group 1: Moore Threads Developments - Moore Threads unveiled its latest "Huagang" architecture, which boasts a 50% increase in computing density and a 10-fold improvement in efficiency compared to the previous generation [3]. - The "Huagang" architecture supports full precision calculations from FP4 to FP64 and introduces new support for MTFP6, MTFP4, and mixed low precision [3]. - Future chip plans include "Huashan," aimed at AI training and inference, and "Lushan," focused on high-performance graphics rendering, with "Lushan" showing a 64-fold increase in AI computing performance and a 50% improvement in ray tracing performance [4]. Group 2: Inspur Developments - Inspur's "scaleX" supercluster system, which publicly debuted, consists of 16 scaleX640 supernodes interconnected via the scaleFabric high-speed network, capable of deploying 10,240 AI accelerator cards [10]. - The scaleX system employs immersion phase change liquid cooling technology to address heat dissipation challenges, achieving a 20-fold increase in computing density per rack and a PUE (Power Usage Effectiveness) of 1.04 [11][12]. - The system supports multi-brand accelerator cards and has optimized compatibility with over 400 mainstream large models, reflecting a strategy to provide a versatile platform for various domestic computing resources [14]. Group 3: Industry Challenges and Solutions - The industry faces challenges in scaling up computational power, particularly in managing heat, power supply, and physical space limitations when deploying thousands of high-power chips in data centers [8][9]. - Both companies are addressing communication delays in distributed computing, with Moore Threads integrating a new asynchronous programming model and self-developed MTLink technology to support clusters exceeding 100,000 cards, while Inspur's scaleFabric network achieves 400 Gb/s bandwidth and sub-microsecond communication latency [12][13]. Group 4: Software Ecosystem and Compatibility - As the hardware specifications approach international standards, the focus is shifting towards optimizing the software stack, with Moore Threads announcing an upgrade to its MUSA unified architecture and achieving over 98% efficiency in core computing libraries [13]. - Inspur emphasizes the compatibility of its systems with various brands of accelerator cards, promoting an open architecture strategy that allows for coexistence of multiple chips [14].

Sugon-国产算力迈入“万卡”时代:摩尔线程发布新一代GPU架构,中科曙光发布万卡超集群 - Reportify