MUSA统一架构
Search documents
摩尔线程 突发大消息!
Zhong Guo Ji Jin Bao· 2025-12-20 13:32
Core Insights - Moore Threads unveiled its next-generation GPU architecture "Huagang" at the first MUSA Developer Conference, showcasing a full-stack technology system centered around its self-developed MUSA unified architecture [2][3] Group 1: New GPU Architecture - The "Huagang" architecture achieves significant breakthroughs in computing density, energy efficiency, precision support, interconnect capabilities, and graphics technology [3] - Key features include a 50% increase in computing density, substantial energy efficiency optimization, and support for full precision calculations from FP4 to FP64, along with new MTFP6/MTFP4 and mixed low precision support [3] - It integrates a new asynchronous programming model and self-developed MTLink high-speed interconnect technology, supporting the expansion of intelligent computing clusters with over 100,000 cards [3] Group 2: Future Chip Releases - Moore Threads announced two upcoming chips based on the "Huagang" architecture: "Huashan" focuses on AI training and inference integration for large-scale intelligent computing, serving as a robust foundation for the next-generation "AI factory" [4] - The "Lushan" chip specializes in high-performance graphics rendering, boasting a 64-fold increase in AI computing performance, a 16-fold increase in geometric processing performance, and a 50-fold increase in ray tracing performance [4] Group 3: Launch of Intelligent Computing Cluster - The company officially launched the "Kua'a" intelligent computing cluster, which offers full precision and general computing capabilities, achieving efficient and stable AI training and inference at a scale of 10,000 cards [5] - Core breakthroughs include a floating-point computing capability of 10 Exa-Flops, with training utilization rates of 60% on Dense models and 40% on MOE models, and a linear training expansion efficiency of 95% [5] Group 4: Competitive Landscape - Moore Threads did not showcase the products at the event, while Inspur unveiled the "Shuguang scaleX" ultra-cluster system, marking the first public appearance of a domestic 10,000-card computing cluster [6] - The industry is witnessing significant innovations in super-node architecture, high-speed interconnect networks, and storage performance optimization, with some technologies surpassing NVIDIA's 2027 roadmap milestones [6]
摩尔线程,突发大消息!
Zhong Guo Ji Jin Bao· 2025-12-20 08:50
Core Insights - Moore Threads unveiled its next-generation GPU architecture "Huagang" at the MUSA Developer Conference, showcasing a full-stack technology system centered around its self-developed MUSA unified architecture [1][2]. Group 1: New GPU Architecture - The "Huagang" architecture features significant improvements in computing performance, with a 50% increase in computing density and enhanced energy efficiency, supporting full precision from FP4 to FP64 [2]. - It integrates a new asynchronous programming model and MTLink high-speed interconnect technology, enabling scalability for over 100,000 card intelligent computing clusters [2]. - The architecture includes an AI generative rendering framework and supports DirectX 12 Ultimate, facilitating a high degree of synergy between graphics rendering and intelligent computing [2]. Group 2: Upcoming Chip Technologies - Moore Threads announced two upcoming chips based on the "Huagang" architecture: "Huashan," which focuses on AI training and inference for large-scale intelligent computing, and "Lushan," which specializes in high-performance graphics rendering [3]. - The "Lushan" chip is expected to enhance AI computing performance by 64 times, geometric processing performance by 16 times, and ray tracing performance by 50 times, along with improvements in texture filling and memory capacity [3]. Group 3: Intelligent Computing Cluster - The company launched the "Kua'e" intelligent computing cluster, capable of full precision and general-purpose computing, achieving a floating-point operation capability of 10 Exa-Flops [4]. - The training efficiency metrics include a 60% utilization rate for Dense large models and a 40% rate for MOE large models, with effective training time exceeding 90% and linear scaling efficiency reaching 95% [4]. Group 4: Competitive Landscape - Moore Threads did not showcase the products at the event, while another company, Inspur, presented its "scaleX" super cluster system, marking the first public appearance of a domestic ten-thousand-level computing cluster [5]. - The competitive landscape indicates that Moore Threads is proactively positioning itself for future computing scenarios, including the launch of the MT Lambda intelligent simulation training platform [5].