Workflow
CM384
icon
Search documents
超节点技术与市场趋势解析
傅里叶的猫· 2025-09-28 16:00
Core Insights - The article discusses the collaboration and solutions in the supernode field, highlighting the major players and their respective strategies in the market [3][4]. Supernode Collaboration and Solutions - Major CSP manufacturers are seeking customized server cabinet products from server suppliers, with a focus on NV solutions [4]. - Key supernode solutions in China include Tencent's ETH-X, NV's NVL72, Huawei's Ascend CM384, and Alibaba's Panjiu, which are either being promoted or have existing customers [4]. - ByteDance is planning an Ethernet innovation solution for large models, primarily based on Broadcom's Tomahawk, but it has not yet been promoted [4]. - Tencent's ETH-X collaborates with Broadcom and Amphenol, utilizing Tomahawk switches and PCIe switches for GPU traffic management [5]. - The main applications of these solutions differ: CM384 focuses on training and large model computation, while ETH-X is more inclined towards inference [5]. Market Share and Supplier Landscape - The supernode solutions have not yet captured a significant market share, with traditional AI servers dominated by Inspur, H3C, and others [6]. - From September 16, CSPs including BAT were restricted from purchasing NV compliant cards, leading to a shift towards domestic cards, which are expected to reach 30%-40% in the coming years [6]. - The overseas market share for major internet companies like Alibaba and Tencent remains small, with ByteDance's overseas to domestic ratio projected to improve [6]. Vendor Competition and Second-Tier Landscape - Inspur remains competitive in terms of cost and pricing, while the competition for second and third places among suppliers is less clear [8]. - The second-tier internet companies have smaller demands, and mainstream suppliers are not actively participating in this segment [9]. - The article notes that the domestic AI ecosystem is lagging behind international developments, with significant advancements expected by 2027 [9][10]. Procurement and Self-Developed Chips - Tencent and Alibaba have shown a preference for NV cards when available, with a current ratio of NV to domestic cards at 3:7 for Alibaba and 7:3 for ByteDance [10]. - The trend towards supernodes is driven by the need for increased computing power and reduced latency, with expectations for large-scale demand in the future [10]. Economic and Technical Aspects - The article highlights the profit margins for AI servers, with major manufacturers achieving higher gross margins compared to general servers [11]. - The introduction of software solutions is expected to enhance profitability, with significant profit increases anticipated from supernode implementations [11].
阿里的磐久超节点和供应链
傅里叶的猫· 2025-09-27 10:14
Core Viewpoint - The article provides a detailed comparison of Alibaba's super node with NVIDIA's NVL72 and Huawei's CM384, focusing on GPU count, interconnect technology, power consumption, and ecosystem compatibility. Group 1: GPU Count - Alibaba's super node, known as "Panjun," utilizes a configuration of 128 GPUs, with each of the 16 computing nodes containing 4 self-developed GPUs, totaling 16 x 4 x 2 = 128 GPUs [4] - In contrast, Huawei's CM384 includes 384 Ascend 910C chips, while NVIDIA's NVL72 consists of 72 GPUs [7] Group 2: Interconnect Technology - NVIDIA's NVL72 employs a cable tray interconnect method using NVLink proprietary protocol [8] - Huawei's CM384 also uses cable connections between multiple racks [10] - Alibaba's super node features an orthogonal interconnect without a backplane, allowing for direct connections between computing and switch nodes, reducing signal transmission loss [12][14] Group 3: Power and Optical Connections - NVIDIA's NVL72 uses copper for scale-up connections, while Huawei's CM384 employs optical interconnects, leading to higher costs and power consumption [15] - Alibaba's super node uses electrical interconnects for internal scale-up, with some connections made via PCB and copper cables, while optical interconnects are used between two ALink switches [18][19] Group 4: Parameter Comparison - Key performance metrics show that NVIDIA's GB200 NVL72 has a BF16 dense TFLOPS of 2,500, while Huawei's CM384 has 780, indicating a significant performance gap [21] - The HBM capacity for NVIDIA's GB200 is 192 GB compared to Huawei's 128 GB, and the scale-up bandwidth for NVIDIA is 7,200 Gb/s while Huawei's is 2,800 Gb/s [21] Group 5: Ecosystem Compatibility - Alibaba claims compatibility with multiple GPU/ASICs, provided they support the ALink protocol, which may pose challenges as major manufacturers are reluctant to adopt proprietary protocols [23] - Alibaba's GPUs are compatible with CUDA, providing a competitive advantage in the current market [24] Group 6: Supply Chain Insights - In the AI and general server integration market, Inspur holds a 33%-35% market share, while Huawei's share is 23% [33] - For liquid cooling, Haikang and Invec are key players, each holding 30%-40% of the market [35] - In the PCB sector, the number of layers has increased to 24-30, with low-loss materials making up over 60% of the composition, significantly increasing the value of single-card PCBs [36]
人工智能供应链:人工智能资本支出上调,而台积电 2026 年 CoWoS 供应量保持不变-Global Technology -AI Supply Chain AI capex revised up, while TSMC 2026 CoWoS supply unchanged
2025-08-05 08:17
Summary of Key Points from the Conference Call Industry Overview - **Industry**: Global Technology, specifically focusing on AI Supply Chain and semiconductor industry - **Key Players**: TSMC, Nvidia, Broadcom, AMD, MediaTek, Alchip, Aspeed, Advantest, KYEC, and others Core Insights and Arguments 1. **Cloud Capex Growth**: Morgan Stanley expects 2026 cloud capex to increase by 31% year-over-year (Y/Y) to US$582 billion, significantly higher than the consensus estimate of 16% Y/Y growth [2][10][67] 2. **AI Server Capex**: The implied AI server capex is projected to grow approximately 70% Y/Y in 2026, driven by an increase in AI server capex mix [2][11] 3. **TSMC CoWoS Capacity**: TSMC's 2026 CoWoS capacity is estimated to remain unchanged at 93k wafers per month, which is expected to support the anticipated AI capex growth [1][4][27] 4. **Hyperscalers' Capex**: The top 11 global hyperscalers are tracking a cash capex of US$445 billion for 2025, reflecting a 56% Y/Y growth, with major contributions from Microsoft, Amazon, and Alphabet [9][68] 5. **AI Chip Demand**: Strong demand for AI semiconductors is expected to persist, with Morgan Stanley maintaining an "Overweight" rating on key semiconductor companies [2][11] Additional Important Insights 1. **China's AI Chip Supply**: There are ongoing developments in China's AI chip supply, with Nvidia placing new wafer orders at TSMC for H20 chips, adding 200k units to the previous production of 1 million units [8] 2. **Nvidia's CoWoS Consumption**: Nvidia's CoWoS consumption assumption has been raised from 580k to 595k units for 2026, indicating strong demand for their chips [22] 3. **Market Dynamics**: The report highlights the competitive landscape, noting that Huawei is developing its own AI chips to compete with Nvidia, showcasing the CM384 server rack prototype [8][41] 4. **Depreciation Trends**: Depreciation as a percentage of total expenses for data center customers is expected to rise, reflecting increased investments in data centers [60] 5. **AI Inference Demand**: Monthly tokens processed by major cloud service providers indicate a growing demand for AI inference, with Google processing over 980 trillion tokens in July 2025, doubling from May [15] Conclusion The conference call emphasizes a robust outlook for the AI semiconductor industry, driven by significant increases in cloud capex and AI server investments. TSMC's capacity plans and the competitive dynamics in the AI chip market, particularly with developments in China, are critical factors to monitor.