Workflow
MUSA软件栈
icon
Search documents
单卡1000 TFLOPS,摩尔线程旗舰级计算卡首曝,性能逼近Blackwell
3 6 Ke· 2026-02-12 12:22
Core Insights - The release of GLM-5 by Zhipu AI has sparked significant industry discussion, highlighting its coding capabilities as the top in global open-source models and fourth overall [1] - The MTT S5000 from Moore Threads has achieved Day-0 compatibility with GLM-5, showcasing impressive hardware specifications that rival NVIDIA's H100 [1][6] Group 1: Performance and Specifications - The MTT S5000 boasts a single-card performance of 1000 TFLOPS, with 80GB of memory and a memory bandwidth of 1.6TB/s, matching NVIDIA's H100 in key specifications [6][7] - The introduction of hardware-level FP8 Tensor Core in MTT S5000 has significantly enhanced its performance, reportedly surpassing H100 in precision [7] - In practical tests, MTT S5000 demonstrated performance approximately 2.5 times that of its competitor H20 in typical end-to-end inference and training tasks [9] Group 2: Ecosystem and Software Integration - The success of Day-0 compatibility is attributed to Moore Threads' agile MUSA software stack, which has over 80% coverage for native operator unit tests, reducing porting costs significantly [3] - The MUSA software platform allows seamless integration with major frameworks like PyTorch and Megatron-LM, enabling zero-cost code migration for developers [11] Group 3: Scalability and Efficiency - The "Kua'e" cluster built on MTT S5000 has achieved a floating-point operation capability of 10 Exa-Flops, marking a significant advancement in large-scale computing [9] - The system maintains over 90% linear scaling efficiency from 64 to 1024 cards, indicating nearly synchronized training speed increases with added computational power [10] Group 4: Real-World Applications - In training scenarios, the S5000 has shown a training loss difference of only 0.62% compared to NVIDIA's H100, demonstrating its accuracy and stability in replicating top-tier model training processes [11] - For inference, the S5000 achieved a prefill throughput of over 4000 tokens/s and a decode throughput exceeding 1000 tokens/s, significantly reducing memory usage and ensuring low response latency in high-concurrency environments [12]
唯快不破!S5000参数首次曝光,发布即适配的国产GPU生态正在形成!
Guang Zhou Ri Bao· 2026-02-12 02:12
Core Insights - The article highlights the launch of the new generation large model GLM-5 by Zhiyuan, which has been successfully adapted and verified on the MTT S5000 GPU from Moer Thread, indicating a potential standard for domestic GPU ecosystem development [1][3]. Group 1: Product Features and Performance - The MTT S5000 GPU, designed for large model training and inference, boasts a maximum AI computing power of 1000 TFLOPS, 80GB of memory, and a memory bandwidth of 1.6TB/s, with inter-card bandwidth of 784GB/s [2]. - In a recent validation of a model with hundreds of billions of parameters, the MTT S5000 demonstrated high consistency with the H100 cluster, maintaining a key model error margin of only a few thousandths, and even slightly surpassing the overall training effectiveness [2]. - The MTT S5000's performance in typical end-to-end inference and training tasks is reported to be approximately 2.5 times that of its competitor H20, attributed to its high computing power and significant cost-performance advantages [2]. Group 2: Software and Adaptation - The agility of the MUSA software stack is crucial for achieving Day-0 adaptation, with over 80% coverage of native operator unit tests, allowing for the reuse of most general operators and significantly reducing porting costs [3]. - The MTT S5000, in combination with GLM-5, excels in core scenarios such as function completion and vulnerability detection, showcasing enhanced planning and debugging capabilities, making it an ideal choice for executing long-term development tasks [3]. - The seamless compatibility and agile response of the MUSA software stack with mainstream software have established a high level of maturity and stability for domestic full-function GPUs, ensuring developers can access the latest model capabilities promptly [3].
摩尔线程MTT S5000率先完成对GLM-5的适配
Xin Lang Cai Jing· 2026-02-12 00:53
Core Viewpoint - The release of the new generation large model GLM-5 by Zhiyu marks a significant advancement in AI capabilities, showcasing the effective integration of the MTT S5000 GPU with the SGLang inference framework for high-performance model inference [1] Group 1 - Zhiyu officially launched the GLM-5 model on February 11, demonstrating its capabilities in AI [1] - The MTT S5000 GPU achieved full-process adaptation and verification on Day-0, indicating rapid deployment capabilities [1] - The MUSA architecture provides extensive operator coverage and strong ecosystem compatibility, facilitating the complete model inference pipeline [1] Group 2 - The MTT S5000 GPU significantly reduces memory usage while ensuring model accuracy through its native FP8 acceleration capabilities [1] - The quick adaptation of the MTT S5000 not only validates the maturity of the MUSA software stack but also highlights the support capabilities of domestic full-function GPUs for the latest large models [1]
国产GPU“四小龙”扎堆IPO
和讯· 2025-07-04 10:15
Core Viewpoint - The article discusses the emergence of domestic GPU companies in China, particularly in the context of the growing demand for AI technologies and the challenges they face in competing with established players like NVIDIA and AMD [3][6][11]. Group 1: Market Dynamics - The recent surge in IPO applications from domestic GPU companies, including Moore Threads and Muxi, is attributed to a more favorable IPO policy and accelerated review processes on the Sci-Tech Innovation Board [4][5]. - The tightening of U.S. chip export controls has led to a decline in NVIDIA's market share in China, creating a window of opportunity for domestic GPU firms to pursue IPOs [6][12]. - The overall number of IPO applications in the first half of the year reached 177, significantly surpassing the total for the previous year, with June alone accounting for over 80% of the applications [5]. Group 2: Company Profiles - Moore Threads aims to create a "fully functional GPU" targeting both data center and consumer gaming markets, similar to NVIDIA's strategy, and has launched several GPU chips based on its self-developed MUSA architecture [10][11]. - Muxi focuses more on the data center market, particularly AI training and inference, with a product line that includes the Xisi N series, Xiyun C series, and Xicai G series [11]. - Both companies have successfully raised significant funding, with Moore Threads securing over 4.5 billion yuan and Muxi achieving a post-investment valuation of 21.07 billion yuan [11][12]. Group 3: Financial Performance - Moore Threads reported revenues of 46 million yuan in 2022, projected to grow to 438 million yuan by 2024, reflecting a compound annual growth rate (CAGR) of over 200% [13]. - Muxi's revenues are expected to increase from 426,400 yuan in 2022 to 743 million yuan in 2024, with a staggering CAGR of 4,074% [13]. - Despite revenue growth, both companies continue to face significant losses, with Moore Threads reporting losses of 1.84 billion yuan in 2022 and Muxi reporting 777 million yuan in the same year [14]. Group 4: Future Prospects - The article highlights the challenges domestic GPU companies face in building ecosystems to compete with NVIDIA's established software and hardware integration, particularly through CUDA [17]. - Both Moore Threads and Muxi are actively developing their software ecosystems to lower barriers for developers and enhance their competitive positions [17][18]. - The upcoming IPOs are seen as crucial for these companies to secure the necessary capital to continue their growth and development in a highly competitive market [15][16].