Workflow
NVSwitch
icon
Search documents
芯片初创公司,单挑英伟达和博通
半导体行业观察· 2026-01-22 04:05
Core Insights - Upscale AI, a chip startup, has raised $200 million in Series A funding to challenge Nvidia's dominance in rack-level AI systems and compete with companies like Cisco, Broadcom, and AMD [1][3] - The rapid influx of investors reflects a growing consensus that traditional network architectures are inadequate for the demands of AI, which require high scalability and synchronization [1][2] Funding and Market Position - The funding round was led by Tiger Global, Premji Invest, and Xora Innovation, with participation from several notable investors, bringing Upscale AI's total funding to over $300 million [1] - The AI interconnect market is projected to reach $100 billion by the end of the decade, prompting Upscale AI to focus on this growing sector [6] Technology and Product Development - Upscale AI is developing a chip named SkyHammer, optimized for vertical scaling networks, which aims to provide deterministic latency for data transmission within rack components [9][10] - The company emphasizes the importance of heterogeneous computing and networks, believing that no single company can provide all the necessary technologies for AI [10][12] Competitive Landscape - Nvidia's networking revenue has seen a significant increase, with a 162% year-over-year growth, highlighting the competitive pressure in the AI networking space [3] - Upscale AI aims to create a high radix switch and a dedicated ASIC to compete with Nvidia's NVSwitch and other existing solutions [14][16] Strategic Partnerships and Standards - Upscale AI is building its platform on open standards and actively participating in various alliances, including the Ultra Accelerator Link and SONiC Foundation [7][17] - The company plans to expand its product line to include more traditional horizontal scaling switches while maintaining partnerships with major data center operators and GPU suppliers [18]
国海证券:总线互联促进AI模型与应用产业发展 维持计算机行业“推荐”评级
智通财经网· 2025-12-25 05:56
Core Viewpoint - The report from Guohai Securities highlights the new demand for high-speed interconnect protocols driven by Scale-Up in the era of large models, emphasizing the importance of bus interconnect in facilitating the development of AI models and applications, thereby creating a positive feedback loop from models to computing power [1] Group 1: High-Speed Interconnect Protocols - High-speed interconnect protocols serve the Scale-Up needs in the era of large models, with computer buses connecting systems and components for data transmission, control, and operation [1] - Mainstream interconnect protocols include NVLink, UALink, SUE, CXL, HSL, and UB, which are crucial for enhancing communication and expanding system bandwidth and device numbers [1] Group 2: NVLink and Competitors - NVLink is leading in the Scale-Up interconnect space, enabling high-speed communication between GPUs, while NVSwitch supports multi-GPU inference with low latency and high bandwidth [2] - The fifth-generation NVLink offers a single-channel bandwidth of 200 Gbps, significantly higher than PCIe Gen5's 32 Gbps [2] - Other protocols like UALink and SUE are also emerging, with UALink achieving a maximum data transfer rate of 200 GT/s and SUE leveraging Ethernet for efficient deployment [3] Group 3: Open Source and Evolving Requirements - NVLink Fusion is moving towards open-source collaboration with several companies, allowing for customized chip Scale-Up to meet model training and inference needs [4] - The evolution of computing power demands higher bandwidth and lower latency from interconnect technologies, as the performance of language models improves with increased model size, dataset size, and computational requirements [4]
Switch芯片研究框架(一):GPU-GPU互连,从Scale-Up到Scale-Out的格局如何?
Soochow Securities· 2025-09-30 06:03
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - The report highlights the competitive landscape of Switch chip manufacturers, with NVIDIA dominating the market through proprietary protocols, while Broadcom is gaining traction with its open-source SUE architecture [6][13] - It emphasizes the potential for domestic manufacturers to replace imported Switch chips, with companies like Shengke Communication leading the way in Ethernet chip production [35] - The report suggests investment in key players such as Haiguang Information and Shengke Communication, while also recommending attention to companies like ZTE and Lanke Technology [6][35] Summary by Sections 1. Switch Chip Manufacturers - NVIDIA's NVSwitch is noted as the highest bandwidth and most mature private solution in the Scale-Up segment [11] - Broadcom holds a 90% market share in cloud data center switches and has introduced the SUE architecture for open Ethernet interconnects [13] - Astera Labs is recognized for its complete product chain, being the only company that integrates switch chips, extension lines, and software management platforms [20] 2. Achieving Domestic Replacement of Switch Chips - Shengke Communication is identified as a leading domestic Ethernet switch chip manufacturer, with 12.8T and 25.6T chips entering customer promotion stages [35] - Other domestic players like Shudao Technology and Lanke Technology are making strides in the PCIe segment, with Shudao expected to achieve breakthroughs in domestic replacement by the end of 2025 [41] - Major manufacturers such as Haiguang, Huawei, and ZTE are also developing self-researched chips to support domestic replacement efforts [43][45][50] 3. Investment Recommendations - The report recommends focusing on Haiguang Information and Shengke Communication for investment opportunities, while suggesting to keep an eye on ZTE, Wantong Development, and Lanke Technology [6][35]
CPU和CPU,是如何通信的?
半导体行业观察· 2025-09-29 01:37
Core Viewpoint - The article discusses the advancements in GPU communication technologies, particularly focusing on GPUDirect Storage, GPUDirect P2P, NVLink, NVSwitch, and GPUDirect RDMA, which enhance data transfer efficiency and reduce bottlenecks in high-performance computing environments [27]. Group 1: GPU and Storage Communication - The data flow from storage systems to GPU memory involves two data copies: from NVMe SSD to system memory and then from system memory to GPU memory, which introduces redundancy [6]. - GPUDirect Storage allows direct access from storage to GPU memory, significantly improving data loading efficiency by reducing unnecessary system copies [7]. Group 2: GPU to GPU Communication - Traditional GPU communication involves multiple data copies through system memory, which can be inefficient [10]. - GPUDirect P2P enables direct data transfer between GPUs, bypassing the CPU and reducing data copy actions by half [12]. Group 3: NVLink and NVSwitch - NVLink provides high bandwidth for data transfer between GPUs, achieving up to 600GB/s for NVIDIA A100 Tensor Core GPUs, which is significantly higher than traditional PCIe [15]. - NVSwitch facilitates full interconnectivity among multiple GPUs, supporting high bandwidth and scalability for large GPU systems [20]. Group 4: Cross-Machine Communication - Traditional cross-machine communication requires multiple steps involving system memory, which can be inefficient [22][24]. - GPUDirect RDMA simplifies this process, allowing direct access to GPU memory from peripheral PCIe devices, thus enhancing communication efficiency [25]. Group 5: Summary of Technologies - The combination of GPUDirect technologies, including P2P and RDMA, supports efficient communication within single nodes and across multiple nodes, essential for AI training and high-performance computing [28].
算力芯片看点系列:如何理解Scale-up网络与高速SerDes芯片?
Soochow Securities· 2025-08-21 09:35
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - In the AI chip Scale-up sector, NVIDIA is currently the dominant player, utilizing its proprietary NVLink technology to interconnect up to 576 GPUs with a communication speed of 1.8TB/s, significantly outperforming competitors using PCIe protocols [11][12] - The establishment of the UALink alliance by major companies like AMD, AWS, Google, and Cisco aims to create an open ecosystem, although challenging NVIDIA's NVLink remains difficult [11][12] - The report emphasizes the importance of high-speed SerDes technology, which is crucial for AI chip interconnectivity, and highlights the need for domestic development in this area to achieve self-sufficiency [45][46] Summary by Sections 1. Scale-up Overview - The report discusses the two main camps in AI chip interconnect technology: proprietary protocols and open ecosystems, with NVIDIA's NVLink being the most mature and effective solution [11][12] 2. NVLink and NVSwitch - NVLink is described as a layered protocol design that enhances data transmission reliability, while NVSwitch acts as a high-capacity switch facilitating efficient GPU communication [14][15] 3. NVIDIA's Interconnect Strategy - NVIDIA employs both NVLink for GPU-to-GPU connections and PCIe for GPU-to-CPU connections, with future developments potentially allowing direct NVLink connections to CPUs [21][30] 4. Domestic Alternatives for AI Chip Scale-up - The report suggests that achieving a domestic alternative to NVLink is challenging, but the UALink initiative may provide new opportunities for local AI chip development [45][46] 5. Investment Recommendations - The report recommends focusing on companies like 盛科通信 (Shengke Communication) and 海光信息 (Haiguang Information), while also monitoring 万通发展 (Wantong Development) and 澜起科技 (Lankai Technology) for potential investment opportunities [6]
博通用一颗芯片,单挑英伟达InfiniBand 和 NVSwitch
半导体行业观察· 2025-07-18 00:57
Core Viewpoint - InfiniBand has been a dominant structure for high-performance computing (HPC) and AI applications, but its market position is challenged by Broadcom's new low-latency Ethernet switch, Tomahawk Ultra, which aims to replace InfiniBand and NVSwitch in AI and HPC clusters [3][5][26]. Group 1: InfiniBand and Its Evolution - InfiniBand gained traction due to Remote Direct Memory Access (RDMA), allowing direct memory access between CPUs, GPUs, and other processing units, which is crucial for AI model training [3]. - Nvidia's acquisition of Mellanox Technologies for $6.9 billion was driven by the anticipated growth of generative AI, necessitating InfiniBand for GPU server connectivity [3][4]. - The rise of large language models and generative AI has propelled InfiniBand to new heights, with NVLink and NVSwitch providing significant advantages for AI server nodes [4]. Group 2: Broadcom's Tomahawk Ultra - Broadcom's Tomahawk Ultra aims to replace InfiniBand as the backend network for HPC and AI clusters, offering low-latency and lossless Ethernet capabilities [5][6]. - The development of Tomahawk Ultra predates the rise of generative AI, targeting applications sensitive to latency [5]. - Tomahawk Ultra's architecture allows for shared memory clusters, enhancing communication speed among processing units compared to traditional InfiniBand or Ethernet [5][6]. Group 3: Performance Metrics - InfiniBand's packet size typically ranges from 256 B to 2 KB, while Ethernet switches often handle larger packets, impacting performance in HPC workloads [7]. - InfiniBand has historically demonstrated lower latency compared to Ethernet, with significant improvements in latency metrics over the years, such as 130 nanoseconds for 200 Gb/s HDR InfiniBand [10][11]. - Broadcom's Tomahawk Ultra boasts a port-to-port jump latency of 250 nanoseconds and a throughput of 77 billion packets per second, outperforming traditional Ethernet switches [12][28]. Group 4: Competitive Landscape - InfiniBand's advantages in latency and packet throughput have made it a preferred choice for HPC workloads, but Ethernet technologies are rapidly evolving to close the gap [6][10]. - Nvidia's NVSwitch is also under threat from Broadcom's Tomahawk Ultra, which is part of a broader strategy to enhance Ethernet capabilities for AI and HPC applications [26][29]. - The introduction of optimized Ethernet headers and lossless features in Tomahawk Ultra aims to improve performance and compatibility with existing standards [15][16].
英伟达是靠钱堆出来了
半导体行业观察· 2025-03-31 01:43
Core Insights - Nvidia is positioned as a leader in the GPU market, with significant contributions from its CEO Jensen Huang and Chief Scientist Bill Dally, who focus on advancing technology and research [1][2][3] - The company's substantial R&D investments have been pivotal in establishing its dominance in high-performance computing (HPC), analytics, and AI workloads [3][4][5] R&D Investment and Financial Performance - Nvidia's R&D spending has historically been high, peaking at 34.2% of revenue in Q1 FY2015, reflecting its commitment to leveraging AI advancements [7][11] - Despite a recent decline in R&D spending as a percentage of revenue, Nvidia's total R&D budget is projected to grow significantly, reaching $12.91 billion in FY2025, a 48.9% increase from FY2024 [12][14] - The company has maintained a consistent R&D expenditure of 20% to 25% of revenue over the past 15 years, comparable to other tech giants like Meta and Google [11][12] Technological Advancements and Market Position - Nvidia's CUDA platform has been instrumental in creating a vast ecosystem of over 900 libraries and frameworks, solidifying its position in the AI and HPC markets [9][10] - The scarcity and high cost of High Bandwidth Memory (HBM) have allowed Nvidia to maintain a competitive edge over AMD, as it can afford to pay premium prices for necessary components [10][11] - Nvidia's research efforts are divided into supply-side and demand-side initiatives, focusing on enhancing GPU technology and expanding application areas for accelerated computing [16][18] Future Outlook and Strategic Direction - Nvidia is preparing for future waves of AI innovation, including what it terms "physical AI," indicating a proactive approach to emerging technologies [7][12] - The company is also exploring quantum computing and has established dedicated research teams to assess its potential impact [16][18] - Nvidia's strategy includes acquiring technologies from third parties and integrating them into its offerings, exemplified by its acquisition of Mellanox Technologies [28]
PCIE,博通的新芯片路线图
半导体行业观察· 2025-02-28 03:08
Core Viewpoint - The article discusses the evolution and significance of PCI-Express technology, particularly the upcoming PCI-Express 6.0, which aims to enhance bandwidth and reduce latency for AI and HPC systems. The transition to this new standard is crucial for meeting the demands of modern computing environments, especially in AI server architectures [1][2][3]. Group 1: PCI-Express Evolution - PCI-Express bandwidth increases every three years, with the latest version, PCI-Express 6.0, expected to double the data rate while maintaining latency [1][2]. - The transition to PAM-4 encoding in PCI-Express 6.0 allows for higher data rates but introduces challenges such as increased error rates, necessitating advanced error correction techniques [3][4]. - Broadcom has been a key player in the development of PCI-Express switches and retimers, with their latest products supporting both PCI-Express 5.0 and 6.0 standards [6][7]. Group 2: Market Demand and Applications - The demand for PCI-Express switches and retimers has surged, driven by the need for higher bandwidth in AI servers, which often require multiple GPUs and accelerators [7][8]. - A typical AI server equipped with eight GPUs utilizes four PCI-Express switches, highlighting the importance of high channel counts for performance [7]. - The complexity of AI server architectures necessitates robust PCI-Express solutions to facilitate communication between various components without a central CPU [8][9]. Group 3: Future Prospects - The introduction of PCI-Express 6.0 is seen as a pivotal step for the industry, with expectations for widespread adoption in AI server manufacturing by late 2024 [6][11]. - There is speculation about the potential for new architectures that could further enhance bandwidth beyond current PCI-Express capabilities, possibly resembling Nvidia's NVLink technology [9][10][11]. - The article emphasizes the need for a coherent telemetry system to support the growing complexity of AI ecosystems, which Broadcom aims to address through its interoperability development platform [8].