Moore Threads Technology-摩尔线程新一代GPU架构“花港”发布，支持十万卡智算集群扩展

Core Insights - The first MUSA Developer Conference showcased the launch of the new GPU architecture "Huagang" by Moore Threads, along with AI training and inference chip "Huashan" and high-performance graphics rendering chip "Lushan" [1][4][5][7] - Moore Threads introduced the "Kua'e" supercomputing cluster, featuring the self-developed "Yangtze" intelligent SoC chip, aimed at enhancing AI computing capabilities [1][9] Group 1: New GPU Architecture and Chips - The "Huagang" GPU architecture features a 50% increase in computing density and supports full precision end-to-end calculations from FP4 to FP64, with new asynchronous programming models and MTLink high-speed interconnect technology for scaling over 100,000 cards [4][14] - The "Huashan" chip focuses on AI training and inference, integrating new asynchronous programming and full precision tensor computing units, supporting large-scale intelligent computing clusters [5] - The "Lushan" chip specializes in high-performance graphics rendering, achieving a 64x increase in AI computing performance, 16x in geometric processing, and 50x in ray tracing performance, catering to AAA games and high-end graphics creation [7] Group 2: Collaborations and Ecosystem Development - Several companies listed on the Sci-Tech Innovation Board, including Dahong Technology and Zhongwang Software, are collaborating with Moore Threads to leverage its GPU for high-performance needs such as ultra-high-definition live streaming and offline video enhancement [3] - The MUSA software architecture has been upgraded to version 5.0, enhancing compatibility with programming languages like TileLang and Triton, and achieving over 98% efficiency in core computing libraries [12] Group 3: Industry Challenges and Future Directions - The need for a unified or highly compatible interface standard for domestic GPU chips is emphasized to avoid fragmentation and inefficiencies in the software ecosystem [13] - The transition from "usable" to "willing to use" domestic GPU platforms hinges on improving developer experience and reducing migration costs [12] - The engineering challenges of building large-scale systems without proprietary interconnects are highlighted, with a focus on achieving reliable low-latency communication and operational efficiency [14]