Workflow
超节点架构
icon
Search documents
超节点架构创新,开源开放共筑全场景算力底座
中国能源报· 2025-09-18 09:10
Core Viewpoint - Huawei has introduced an innovative super node architecture aimed at building a robust all-scenario computing foundation, emphasizing open-source and hardware openness to foster industry collaboration and innovation [1][10]. Group 1: Super Node Architecture - The super node architecture, based on the Lingqu interconnection protocol, allows multiple physical machines to be deeply interconnected, enabling them to function as a single logical unit for learning, reasoning, and efficient computing [1][2]. - This architecture addresses the challenges of traditional server stacking, which can lead to lower computing efficiency and frequent interruptions during training as cluster sizes increase [1][2]. Group 2: Product Launches - Huawei has launched several new products based on the super node architecture, including the Atlas 950 SuperPoD, Atlas 850 and Atlas 860 AI servers, Atlas 350 AI accelerator cards, and the TaiShan 950 SuperPoD [4][5]. - The Atlas 950 SuperPoD is designed for large-scale AI computing tasks, featuring innovations such as zero-cable electrical interconnection and enhanced liquid cooling reliability [4]. - The Atlas 850 is the first enterprise-grade air-cooled AI super node server, capable of forming a super node cluster with up to 128 units and 1024 cards [4]. Group 3: Performance Enhancements - The Atlas 350 accelerator card, utilizing the Ascend 950PR chip, offers a 2x increase in vector computing power and a 2.5x performance boost in recommendation inference scenarios [5]. - The TaiShan 950 SuperPoD features ultra-low latency of 370 nanoseconds and a bandwidth of 2.8T, significantly enhancing performance in database and virtual machine migration scenarios [5]. Group 4: Open Collaboration - Huawei is committed to open collaboration by sharing super node technology with the industry, allowing partners to develop products based on the Lingqu protocol and super node reference architecture [6]. - The company has opened its super node hardware, including NPU modules and AI accelerator cards, to facilitate incremental development by customers and partners [6]. Group 5: Software Innovation - Huawei is also focusing on software openness, with plans to open-source the Lingqu operating system components and support various open-source communities, accelerating developer innovation [9]. - The Ascend CANN and Mind series components will be open-sourced, prioritizing support for popular frameworks like PyTorch, enhancing flexibility for developers [9].
华为宣布推出超节点架构,可将多台物理机器深度互联
Xin Lang Ke Ji· 2025-09-18 06:39
Core Viewpoint - Huawei has introduced an innovative super node architecture aimed at redefining large-scale effective computing power, emphasizing open-source and hardware openness to foster industry collaboration and innovation [2][3]. Group 1: Super Node Architecture - The super node architecture allows multiple physical machines to be deeply interconnected, enabling them to function as a single logical unit for learning, reasoning, and thinking [2]. - This architecture is designed to meet the computing needs of large data centers, enterprise-level data centers, and small workstations across various industries [2]. - Key features of the super node architecture include resource pooling, scalable expansion, and reliable performance, facilitating high bandwidth and low latency interconnections for computing and storage units [2]. Group 2: New Product Launch - Huawei has launched several new products based on the super node architecture, including the AI super node Atlas 950 SuperPoD, enterprise-level AI super node servers Atlas 850 and Atlas 860, AI next-generation cards Atlas 350, and the first universal super node Taishan 950 SuperPoD [2]. - These products are designed to enhance the capabilities of data centers and support a wide range of computing scenarios [2]. Group 3: Open Source Commitment - Huawei is fully opening its super node technology to share the technological benefits with the industry, promoting inclusive and collaborative innovation [3]. - The operating system components of the Lingqu protocol will be open-sourced, with code being integrated into various upstream open-source communities such as openEuler [3].
2025开放计算技术大会举行,加速AIDC全球协作
Zhong Guo Xin Wen Wang· 2025-08-11 11:03
Group 1 - The 2025 Open Computing Technology Conference focuses on the development trends of MoE (Mixture of Experts) large models and AI agents, emphasizing the importance of open computing for enhancing vertical scaling and horizontal efficiency [1] - The rise of open-source models and open computing systems is becoming a mainstream trend in the AI era, facilitating global collaboration and addressing the challenges faced by future GW-level AI data centers [1][2] - The conference is co-hosted by the Open Compute Project (OCP) and the Open Computing Technical Committee (OCTC), with participation from over a thousand experts and representatives from major companies and institutions [1] Group 2 - The combination of open-source models and open computing is expected to drive a surge in long-tail applications, accelerating the realization of AI accessibility [2] - OCP's focus is shifting towards AI, with strategic plans centered around open systems for AI, including physical and IT infrastructure for data centers [2] - The OCTC emphasizes the need for industry collaboration and the establishment of practical standards to promote technological innovation and benefits across various sectors [2] Group 3 - The rapid growth of MoE model parameters is driving a transformation in computing architecture, necessitating extreme requirements for computing density and interconnect speed [3] - The industry is moving towards a new era characterized by super-node architectures that optimize network, computing, software, and hardware [3] - Challenges such as power, interconnect, and reliability in AI infrastructure are prompting a reconfiguration of computing systems, with super-node architecture becoming a core development path [3] Group 4 - The core goal of the open computing community is to leverage ecosystem strengths to break performance bottlenecks and drive business innovation [4] - The GW-level AI data centers are catalyzing significant changes in the computing ecosystem, accelerating cross-community collaboration [5] - OCP is preparing to establish the "GW-level Open Intelligent Computing Center OCP China Community Group" to promote the implementation of AI open systems in China [5]