NVSwitch
Search documents
国海证券:总线互联促进AI模型与应用产业发展 维持计算机行业“推荐”评级
智通财经网· 2025-12-25 05:56
1)PCIe协议与交换机是传统的计算机扩展总线标准,随技术迭代速率提升,但CPU、GPU等各设备间通 信速度仍存瓶颈,因此CXL协议应运而生。此外,众多厂商亦使用各自互联协议,其中NVLink处于领 先地位。 2)NVLink实现了Scale-Up中GPU与GPU间高速互联;NVSwitch则是多GPU互联推理的硬件支持,其延迟 低、通道数多、带宽高、功耗大;NVLink C2C则实现了Scale- Up中实现CPU与CPU、CPU与GPU间的高 速互联,在速率上第五代NVLink单通道的带宽为200Gbps,而PCIe Gen5为32Gbps。 3)华为灵衢(UB)提供百ns同步内存语义访问时延、2-5us异步内存语义访问时延,提供组件间TB/s级带 宽,UBProcessing Unit是支持UB协议栈的处理单元,其内嵌UBSwitch,实现多级UBSwitch扩展组网, 并支持通过UBoE与以太Switch融合组网。 4)UAlink利用以太网基础设施实现Scale-Up,UALink1.0规范支持每通道最高200GT/s的数据传输速率, 每四条物理通道组合构成一个UALink基本单元组,在发送(TX) ...
Switch芯片研究框架(一):GPU-GPU互连,从Scale-Up到Scale-Out的格局如何?
Soochow Securities· 2025-09-30 06:03
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - The report highlights the competitive landscape of Switch chip manufacturers, with NVIDIA dominating the market through proprietary protocols, while Broadcom is gaining traction with its open-source SUE architecture [6][13] - It emphasizes the potential for domestic manufacturers to replace imported Switch chips, with companies like Shengke Communication leading the way in Ethernet chip production [35] - The report suggests investment in key players such as Haiguang Information and Shengke Communication, while also recommending attention to companies like ZTE and Lanke Technology [6][35] Summary by Sections 1. Switch Chip Manufacturers - NVIDIA's NVSwitch is noted as the highest bandwidth and most mature private solution in the Scale-Up segment [11] - Broadcom holds a 90% market share in cloud data center switches and has introduced the SUE architecture for open Ethernet interconnects [13] - Astera Labs is recognized for its complete product chain, being the only company that integrates switch chips, extension lines, and software management platforms [20] 2. Achieving Domestic Replacement of Switch Chips - Shengke Communication is identified as a leading domestic Ethernet switch chip manufacturer, with 12.8T and 25.6T chips entering customer promotion stages [35] - Other domestic players like Shudao Technology and Lanke Technology are making strides in the PCIe segment, with Shudao expected to achieve breakthroughs in domestic replacement by the end of 2025 [41] - Major manufacturers such as Haiguang, Huawei, and ZTE are also developing self-researched chips to support domestic replacement efforts [43][45][50] 3. Investment Recommendations - The report recommends focusing on Haiguang Information and Shengke Communication for investment opportunities, while suggesting to keep an eye on ZTE, Wantong Development, and Lanke Technology [6][35]
CPU和CPU,是如何通信的?
半导体行业观察· 2025-09-29 01:37
Core Viewpoint - The article discusses the advancements in GPU communication technologies, particularly focusing on GPUDirect Storage, GPUDirect P2P, NVLink, NVSwitch, and GPUDirect RDMA, which enhance data transfer efficiency and reduce bottlenecks in high-performance computing environments [27]. Group 1: GPU and Storage Communication - The data flow from storage systems to GPU memory involves two data copies: from NVMe SSD to system memory and then from system memory to GPU memory, which introduces redundancy [6]. - GPUDirect Storage allows direct access from storage to GPU memory, significantly improving data loading efficiency by reducing unnecessary system copies [7]. Group 2: GPU to GPU Communication - Traditional GPU communication involves multiple data copies through system memory, which can be inefficient [10]. - GPUDirect P2P enables direct data transfer between GPUs, bypassing the CPU and reducing data copy actions by half [12]. Group 3: NVLink and NVSwitch - NVLink provides high bandwidth for data transfer between GPUs, achieving up to 600GB/s for NVIDIA A100 Tensor Core GPUs, which is significantly higher than traditional PCIe [15]. - NVSwitch facilitates full interconnectivity among multiple GPUs, supporting high bandwidth and scalability for large GPU systems [20]. Group 4: Cross-Machine Communication - Traditional cross-machine communication requires multiple steps involving system memory, which can be inefficient [22][24]. - GPUDirect RDMA simplifies this process, allowing direct access to GPU memory from peripheral PCIe devices, thus enhancing communication efficiency [25]. Group 5: Summary of Technologies - The combination of GPUDirect technologies, including P2P and RDMA, supports efficient communication within single nodes and across multiple nodes, essential for AI training and high-performance computing [28].
算力芯片看点系列:如何理解Scale-up网络与高速SerDes芯片?
Soochow Securities· 2025-08-21 09:35
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - In the AI chip Scale-up sector, NVIDIA is currently the dominant player, utilizing its proprietary NVLink technology to interconnect up to 576 GPUs with a communication speed of 1.8TB/s, significantly outperforming competitors using PCIe protocols [11][12] - The establishment of the UALink alliance by major companies like AMD, AWS, Google, and Cisco aims to create an open ecosystem, although challenging NVIDIA's NVLink remains difficult [11][12] - The report emphasizes the importance of high-speed SerDes technology, which is crucial for AI chip interconnectivity, and highlights the need for domestic development in this area to achieve self-sufficiency [45][46] Summary by Sections 1. Scale-up Overview - The report discusses the two main camps in AI chip interconnect technology: proprietary protocols and open ecosystems, with NVIDIA's NVLink being the most mature and effective solution [11][12] 2. NVLink and NVSwitch - NVLink is described as a layered protocol design that enhances data transmission reliability, while NVSwitch acts as a high-capacity switch facilitating efficient GPU communication [14][15] 3. NVIDIA's Interconnect Strategy - NVIDIA employs both NVLink for GPU-to-GPU connections and PCIe for GPU-to-CPU connections, with future developments potentially allowing direct NVLink connections to CPUs [21][30] 4. Domestic Alternatives for AI Chip Scale-up - The report suggests that achieving a domestic alternative to NVLink is challenging, but the UALink initiative may provide new opportunities for local AI chip development [45][46] 5. Investment Recommendations - The report recommends focusing on companies like 盛科通信 (Shengke Communication) and 海光信息 (Haiguang Information), while also monitoring 万通发展 (Wantong Development) and 澜起科技 (Lankai Technology) for potential investment opportunities [6]
博通用一颗芯片,单挑英伟达InfiniBand 和 NVSwitch
半导体行业观察· 2025-07-18 00:57
Core Viewpoint - InfiniBand has been a dominant structure for high-performance computing (HPC) and AI applications, but its market position is challenged by Broadcom's new low-latency Ethernet switch, Tomahawk Ultra, which aims to replace InfiniBand and NVSwitch in AI and HPC clusters [3][5][26]. Group 1: InfiniBand and Its Evolution - InfiniBand gained traction due to Remote Direct Memory Access (RDMA), allowing direct memory access between CPUs, GPUs, and other processing units, which is crucial for AI model training [3]. - Nvidia's acquisition of Mellanox Technologies for $6.9 billion was driven by the anticipated growth of generative AI, necessitating InfiniBand for GPU server connectivity [3][4]. - The rise of large language models and generative AI has propelled InfiniBand to new heights, with NVLink and NVSwitch providing significant advantages for AI server nodes [4]. Group 2: Broadcom's Tomahawk Ultra - Broadcom's Tomahawk Ultra aims to replace InfiniBand as the backend network for HPC and AI clusters, offering low-latency and lossless Ethernet capabilities [5][6]. - The development of Tomahawk Ultra predates the rise of generative AI, targeting applications sensitive to latency [5]. - Tomahawk Ultra's architecture allows for shared memory clusters, enhancing communication speed among processing units compared to traditional InfiniBand or Ethernet [5][6]. Group 3: Performance Metrics - InfiniBand's packet size typically ranges from 256 B to 2 KB, while Ethernet switches often handle larger packets, impacting performance in HPC workloads [7]. - InfiniBand has historically demonstrated lower latency compared to Ethernet, with significant improvements in latency metrics over the years, such as 130 nanoseconds for 200 Gb/s HDR InfiniBand [10][11]. - Broadcom's Tomahawk Ultra boasts a port-to-port jump latency of 250 nanoseconds and a throughput of 77 billion packets per second, outperforming traditional Ethernet switches [12][28]. Group 4: Competitive Landscape - InfiniBand's advantages in latency and packet throughput have made it a preferred choice for HPC workloads, but Ethernet technologies are rapidly evolving to close the gap [6][10]. - Nvidia's NVSwitch is also under threat from Broadcom's Tomahawk Ultra, which is part of a broader strategy to enhance Ethernet capabilities for AI and HPC applications [26][29]. - The introduction of optimized Ethernet headers and lossless features in Tomahawk Ultra aims to improve performance and compatibility with existing standards [15][16].
英伟达是靠钱堆出来了
半导体行业观察· 2025-03-31 01:43
Core Insights - Nvidia is positioned as a leader in the GPU market, with significant contributions from its CEO Jensen Huang and Chief Scientist Bill Dally, who focus on advancing technology and research [1][2][3] - The company's substantial R&D investments have been pivotal in establishing its dominance in high-performance computing (HPC), analytics, and AI workloads [3][4][5] R&D Investment and Financial Performance - Nvidia's R&D spending has historically been high, peaking at 34.2% of revenue in Q1 FY2015, reflecting its commitment to leveraging AI advancements [7][11] - Despite a recent decline in R&D spending as a percentage of revenue, Nvidia's total R&D budget is projected to grow significantly, reaching $12.91 billion in FY2025, a 48.9% increase from FY2024 [12][14] - The company has maintained a consistent R&D expenditure of 20% to 25% of revenue over the past 15 years, comparable to other tech giants like Meta and Google [11][12] Technological Advancements and Market Position - Nvidia's CUDA platform has been instrumental in creating a vast ecosystem of over 900 libraries and frameworks, solidifying its position in the AI and HPC markets [9][10] - The scarcity and high cost of High Bandwidth Memory (HBM) have allowed Nvidia to maintain a competitive edge over AMD, as it can afford to pay premium prices for necessary components [10][11] - Nvidia's research efforts are divided into supply-side and demand-side initiatives, focusing on enhancing GPU technology and expanding application areas for accelerated computing [16][18] Future Outlook and Strategic Direction - Nvidia is preparing for future waves of AI innovation, including what it terms "physical AI," indicating a proactive approach to emerging technologies [7][12] - The company is also exploring quantum computing and has established dedicated research teams to assess its potential impact [16][18] - Nvidia's strategy includes acquiring technologies from third parties and integrating them into its offerings, exemplified by its acquisition of Mellanox Technologies [28]
PCIE,博通的新芯片路线图
半导体行业观察· 2025-02-28 03:08
Core Viewpoint - The article discusses the evolution and significance of PCI-Express technology, particularly the upcoming PCI-Express 6.0, which aims to enhance bandwidth and reduce latency for AI and HPC systems. The transition to this new standard is crucial for meeting the demands of modern computing environments, especially in AI server architectures [1][2][3]. Group 1: PCI-Express Evolution - PCI-Express bandwidth increases every three years, with the latest version, PCI-Express 6.0, expected to double the data rate while maintaining latency [1][2]. - The transition to PAM-4 encoding in PCI-Express 6.0 allows for higher data rates but introduces challenges such as increased error rates, necessitating advanced error correction techniques [3][4]. - Broadcom has been a key player in the development of PCI-Express switches and retimers, with their latest products supporting both PCI-Express 5.0 and 6.0 standards [6][7]. Group 2: Market Demand and Applications - The demand for PCI-Express switches and retimers has surged, driven by the need for higher bandwidth in AI servers, which often require multiple GPUs and accelerators [7][8]. - A typical AI server equipped with eight GPUs utilizes four PCI-Express switches, highlighting the importance of high channel counts for performance [7]. - The complexity of AI server architectures necessitates robust PCI-Express solutions to facilitate communication between various components without a central CPU [8][9]. Group 3: Future Prospects - The introduction of PCI-Express 6.0 is seen as a pivotal step for the industry, with expectations for widespread adoption in AI server manufacturing by late 2024 [6][11]. - There is speculation about the potential for new architectures that could further enhance bandwidth beyond current PCI-Express capabilities, possibly resembling Nvidia's NVLink technology [9][10][11]. - The article emphasizes the need for a coherent telemetry system to support the growing complexity of AI ecosystems, which Broadcom aims to address through its interoperability development platform [8].