Workflow
MUSA软件栈
icon
Search documents
单卡1000 TFLOPS,摩尔线程旗舰级计算卡首曝,性能逼近Blackwell
3 6 Ke· 2026-02-12 12:22
Core Insights - The release of GLM-5 by Zhipu AI has sparked significant industry discussion, highlighting its coding capabilities as the top in global open-source models and fourth overall [1] - The MTT S5000 from Moore Threads has achieved Day-0 compatibility with GLM-5, showcasing impressive hardware specifications that rival NVIDIA's H100 [1][6] Group 1: Performance and Specifications - The MTT S5000 boasts a single-card performance of 1000 TFLOPS, with 80GB of memory and a memory bandwidth of 1.6TB/s, matching NVIDIA's H100 in key specifications [6][7] - The introduction of hardware-level FP8 Tensor Core in MTT S5000 has significantly enhanced its performance, reportedly surpassing H100 in precision [7] - In practical tests, MTT S5000 demonstrated performance approximately 2.5 times that of its competitor H20 in typical end-to-end inference and training tasks [9] Group 2: Ecosystem and Software Integration - The success of Day-0 compatibility is attributed to Moore Threads' agile MUSA software stack, which has over 80% coverage for native operator unit tests, reducing porting costs significantly [3] - The MUSA software platform allows seamless integration with major frameworks like PyTorch and Megatron-LM, enabling zero-cost code migration for developers [11] Group 3: Scalability and Efficiency - The "Kua'e" cluster built on MTT S5000 has achieved a floating-point operation capability of 10 Exa-Flops, marking a significant advancement in large-scale computing [9] - The system maintains over 90% linear scaling efficiency from 64 to 1024 cards, indicating nearly synchronized training speed increases with added computational power [10] Group 4: Real-World Applications - In training scenarios, the S5000 has shown a training loss difference of only 0.62% compared to NVIDIA's H100, demonstrating its accuracy and stability in replicating top-tier model training processes [11] - For inference, the S5000 achieved a prefill throughput of over 4000 tokens/s and a decode throughput exceeding 1000 tokens/s, significantly reducing memory usage and ensuring low response latency in high-concurrency environments [12]
唯快不破!S5000参数首次曝光,发布即适配的国产GPU生态正在形成!
Guang Zhou Ri Bao· 2026-02-12 02:12
MUSA软件栈的敏捷性是实现Day-0适配的关键。基于MUSA架构的TileLang原生算子单元测试覆盖率已 超过80%,使得绝大多数通用算子可直接复用,显著降低移植成本,并能快速跟进前沿模型结构与新特 性演进。 2月11日,智谱正式发布新一代大模型GLM-5。摩尔线程基于SGLang推理框架,在旗舰级AI训推一体全 功能GPU MTT S5000上,Day-0就完成了全流程适配与验证。如此"发布即适配"的情况,有望成为未来 国产GPU生态构建的常态。 凭借MUSA架构广泛的算子覆盖与强大的生态兼容能力,摩尔线程成功打通了模型推理全链路,并深度 释放MTT S5000的原生FP8加速能力,在确保模型精度的同时显著降低了显存占用,实现了GLM-5的高 性能推理。此次快速适配,不仅印证了MUSA软件栈的成熟度,更充分展现了国产全功能GPU对最新大 模型即时、高效的支持能力。 摩尔线程方面期待,GLM-5与MTT S5000的国产双强联合,将为开发者带来可对标国际顶尖模型的极致 编程体验。无论是在函数补全、漏洞检测还是Debug场景中,该组合均表现卓越,以显著增强的逻辑规 划能力,从容应对各类复杂的长程任务挑战。 ...
摩尔线程MTT S5000率先完成对GLM-5的适配
Xin Lang Cai Jing· 2026-02-12 00:53
Core Viewpoint - The release of the new generation large model GLM-5 by Zhiyu marks a significant advancement in AI capabilities, showcasing the effective integration of the MTT S5000 GPU with the SGLang inference framework for high-performance model inference [1] Group 1 - Zhiyu officially launched the GLM-5 model on February 11, demonstrating its capabilities in AI [1] - The MTT S5000 GPU achieved full-process adaptation and verification on Day-0, indicating rapid deployment capabilities [1] - The MUSA architecture provides extensive operator coverage and strong ecosystem compatibility, facilitating the complete model inference pipeline [1] Group 2 - The MTT S5000 GPU significantly reduces memory usage while ensuring model accuracy through its native FP8 acceleration capabilities [1] - The quick adaptation of the MTT S5000 not only validates the maturity of the MUSA software stack but also highlights the support capabilities of domestic full-function GPUs for the latest large models [1]
国产GPU“四小龙”扎堆IPO
和讯· 2025-07-04 10:15
Core Viewpoint - The article discusses the emergence of domestic GPU companies in China, particularly in the context of the growing demand for AI technologies and the challenges they face in competing with established players like NVIDIA and AMD [3][6][11]. Group 1: Market Dynamics - The recent surge in IPO applications from domestic GPU companies, including Moore Threads and Muxi, is attributed to a more favorable IPO policy and accelerated review processes on the Sci-Tech Innovation Board [4][5]. - The tightening of U.S. chip export controls has led to a decline in NVIDIA's market share in China, creating a window of opportunity for domestic GPU firms to pursue IPOs [6][12]. - The overall number of IPO applications in the first half of the year reached 177, significantly surpassing the total for the previous year, with June alone accounting for over 80% of the applications [5]. Group 2: Company Profiles - Moore Threads aims to create a "fully functional GPU" targeting both data center and consumer gaming markets, similar to NVIDIA's strategy, and has launched several GPU chips based on its self-developed MUSA architecture [10][11]. - Muxi focuses more on the data center market, particularly AI training and inference, with a product line that includes the Xisi N series, Xiyun C series, and Xicai G series [11]. - Both companies have successfully raised significant funding, with Moore Threads securing over 4.5 billion yuan and Muxi achieving a post-investment valuation of 21.07 billion yuan [11][12]. Group 3: Financial Performance - Moore Threads reported revenues of 46 million yuan in 2022, projected to grow to 438 million yuan by 2024, reflecting a compound annual growth rate (CAGR) of over 200% [13]. - Muxi's revenues are expected to increase from 426,400 yuan in 2022 to 743 million yuan in 2024, with a staggering CAGR of 4,074% [13]. - Despite revenue growth, both companies continue to face significant losses, with Moore Threads reporting losses of 1.84 billion yuan in 2022 and Muxi reporting 777 million yuan in the same year [14]. Group 4: Future Prospects - The article highlights the challenges domestic GPU companies face in building ecosystems to compete with NVIDIA's established software and hardware integration, particularly through CUDA [17]. - Both Moore Threads and Muxi are actively developing their software ecosystems to lower barriers for developers and enhance their competitive positions [17][18]. - The upcoming IPOs are seen as crucial for these companies to secure the necessary capital to continue their growth and development in a highly competitive market [15][16].