InfiniBand

Search documents
CPO,势不可挡
半导体芯闻· 2025-06-23 10:23
如果您希望可以时常见面,欢迎标星收藏哦~ 2025 年 OFC 展会明确表明:数据中心向 CPO 交换机的转型不可避免,其主要驱动力在于 CPO 带来的功耗节省。 从黄仁勋在 2025 年 GTC 大会上展示 CPO 交换机,到众多厂商在 2025 年 OFC 展会上演示集成 在 ASIC 封装内的光引擎,共封装光学技术已无处不在。 值得注意的是,Arista 联合创始人、数据中心网络领域的长期远见者安迪・贝托尔斯海姆(Andy Bechtolsheim)尚未改变立场。在 2025 年 OFC 展会上,他继续主张线性可插拔光学(LPO)是 更优选择。LPO 移除了板载数字信号处理器,功耗较传统可插拔光学器件显著降低 —— 通常减少 30-50%。更多细节可查看我的帖子。 安迪的核心论点是,至少在 1600G 代际,LPO 与 CPO 的功率效率大致相当。那么,为何要接受 CPO 额外的复杂性呢?然而,在这些更高的 SerDes 速率下,LPO 面临着 ASIC 与面板光器件之 间电通道插入损耗的挑战。安迪认为,在 1600G 代际,可通过带近封装连接器的跨接电缆来缓解 这一问题。 他对 CPO 的担忧包括:失 ...
CPO,势不可挡
半导体行业观察· 2025-06-22 03:23
Core Viewpoint - The transition of data centers to Co-Packaged Optics (CPO) switches is inevitable, primarily driven by the power savings offered by CPO technology [1][2]. Group 1: CPO Technology and Market Trends - CPO technology is gaining traction as it significantly reduces power consumption, with potential savings of 30-50% compared to traditional optical devices [1][2]. - The industry has made substantial progress in CPO reliability over the past two years, making it a viable option for future high-speed data rates [2]. - The upcoming 400G SerDes generation may see CPO as the only feasible choice due to excessive insertion loss from traditional PCB traces and cables [2]. Group 2: Technical Integration of CPO - CPO solutions typically integrate electronic integrated circuits (EIC) and photonic integrated circuits (PIC) within the same package [3]. - Two main integration methods for optical engines within ASIC packages are the silicon interposer approach and the organic substrate approach [4][5]. - The silicon interposer method allows for high-density connections but complicates thermal management due to the proximity of high-power EICs [6]. - The organic substrate method offers better thermal isolation and modularity, allowing for independent testing of optical engines before assembly [7][8]. Group 3: Bandwidth Density and Performance - Bandwidth density, a critical metric for CPO solutions, measures the amount of data transmitted per millimeter along optical interfaces, typically expressed in Tbps/mm [9]. - Higher bandwidth density is essential to meet the explosive growth in bandwidth demand in data centers and high-performance computing systems [9]. Group 4: Competitive Landscape - Broadcom's Bailly CPO switch integrates eight 6.4 Tbps optical engines, achieving a total external bandwidth of 51.2 Tbps [12]. - NVIDIA's Quantum-X InfiniBand switch aims for higher scalability, targeting over 100 Tbps with advanced optical engine integration [17][18]. - Broadcom's next-generation switches are expected to reach 102.4 Tbps, while NVIDIA's architecture is designed for future demands of 200G SerDes and beyond [16][19]. Group 5: Power Efficiency and Thermal Management - Both Broadcom and NVIDIA report significant reductions in power consumption per bit with their CPO solutions, with Broadcom achieving approximately 5.5W per 800 Gb/s port compared to 15W for traditional modules [35]. - Effective cooling solutions, such as liquid cooling, are necessary to manage the heat generated by high-density ASIC packages [35][36]. Group 6: Future Directions and Challenges - The industry is exploring advanced coupling methods, such as vertical coupling and multi-core fibers, to enhance optical connectivity and bandwidth density [38][40]. - Challenges in deploying CPO include ecosystem disruption, operational complexity, and the need for robust reliability validation [42][43]. - CPO's prospects appear brighter in vertical scaling applications, where integrated solutions from single vendors can simplify procurement and deployment [45].
AI 网络之战-性能如何重塑竞争格局
2025-06-19 09:46
](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2F substack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8548d7b7-c5bb-483d-86e6-4bb 490857628_622x415.png) [ [ ](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2F substack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F594137b2-3caf-4495-9f20-6932 bf6d377d_624x456.png) TLDR · NVIDIA's Strategic Dominance Through Integration: NVIDIA's early recognition of AI's unique network ...
The Best Trillion-Dollar Stock to Buy Right Now? Wall Street Has a Clear Answer for Investors.
The Motley Fool· 2025-06-18 08:12
Ten public companies have achieved a market value exceeding $1 trillion as of June 16. They are listed below in descending order based on upside implied by the median target price set by Wall Street analysts. Nvidia more or less put the first concern to rest with impressive first-quarter financial results that exceeded expectations on the top and bottom lines. Revenue increased 69% to $44 billion due to what CEO Jensen Huang characterized as "incredibly strong" demand for Nvidia AI infrastructure. And non-G ...
聊一聊目前主流的AI Networking方案
傅里叶的猫· 2025-06-16 13:04
2020 年初,一家领先 AI 公司的研究人员遇到了一个在几年前对任何网络工程师而言都荒谬的问 题:他们的旗舰语言模型(最终将为对话式 AI 系统提供动力的类型)训练了三周后,在完成 60% 时突然停滞。数百个 GPU 闲置在多个服务器机架中,消耗云资源的速度让首席财务官们心痛不已。 工程团队的第一反应是检查明显的问题所在:计算利用率?99%。内存使用情况?最佳。存储 I/O? 完全在限制范围内。然而训练过程实际上已冻结,GPU 在等待本应瞬间完成的任务。 事实证明,瓶颈并非数十年来定义计算性能的任何组件, 而是网络 。更具体地说,这个网络设计适 用于计算机偶尔相互通信的场景,而非数千个处理器需要完美同步协调每一次计算的场景。传统数 据中心网络栈(为响应用户请求的 Web 服务器、为应用提供服务的数据库和移动文件的存储系统而 构建)根本无法满足 AI 工作负载对集体通信模式的需求 —— 这是人类应用从未有过的需求。 这不仅是技术问题,更是将重塑整个行业的架构不匹配。AI 工作负载不仅需要更多网络资源,还需 要根本不同的网络架构。而这种差异正是一场竞争革命的种子:它将一个看似 unlikely 的参与者推 向主 ...
UEC终于来了,能撼动InfiniBand吗?
半导体行业观察· 2025-06-12 00:42
公众号记得加星标⭐️,第一时间看推送不会错过。 来源:内容 编译自 semianalysis 。 今天,超级以太网联盟 (UEC)宣布发布UEC 规范 1.0,这是一个基于以太网的全面通信堆栈,旨在 满足现代人工智能 (AI) 和高性能计算 (HPC) 工作负载的严苛需求。此次发布标志着我们朝着重新定 义下一代数据密集型基础设施以太网迈出了关键一步。 UEC 规范 1.0 为网络堆栈的所有层(包括 NIC、交换机、光纤和电缆)提供了高性能、可扩展且可 互操作的解决方案,从而实现了无缝的多供应商集成并加速了整个生态系统的创新。 UEC 规范正在推动整个行业的采用,通过推广开放、可互操作的标准来避免供应商锁定。随着积极 的实施和合规计划的推进,UEC 正在为整个行业建立一个统一且易于访问的生态系统铺平道路。 超级以太网联盟技术咨询委员会主席 Hugh Holbrook 补充道:"超级以太网 1.0 规范是人工智能、高 性能计算 (HPC) 和网络专家、系统和芯片供应商以及网络运营商通力合作的成果。它融合了与应 用、传输协议、拥塞控制、直接内存访问、以太网链路和 PHY 技术以及网络安全相关的丰富知识、 经验和理念 ...
Nvidia(NVDA) - 2025 FY - Earnings Call Transcript
2025-06-10 15:00
Financial Data and Key Metrics Changes - NVIDIA has a buy rating with a twelve-month target price of $200, driven by its leadership in AI and expansion into full rack scale deployments [2] - The company reported significant advancements in networking capabilities, particularly in AI data centers, emphasizing the importance of networking as a critical component of computing infrastructure [8][9] Business Line Data and Key Metrics Changes - NVIDIA's networking infrastructure has evolved from supporting eight GPUs last year to 72 GPUs this year, with future plans to support up to 576 GPUs [19][20] - The company is focusing on both scale-up and scale-out networking strategies to enhance performance and efficiency in AI workloads [15][16] Market Data and Key Metrics Changes - The demand for AI workloads is increasing, necessitating the design of data centers that can handle distributed computing and high throughput requirements [22][29] - NVIDIA's networking solutions, including InfiniBand and Spectrum X, are positioned as the gold standard for AI applications, with a focus on lossless data transmission and low latency [36][38] Company Strategy and Development Direction - NVIDIA is committed to co-designing networks with compute elements to optimize performance for AI workloads, moving beyond traditional networking paradigms [22][28] - The company aims to integrate Ethernet into AI applications, making it accessible for enterprises familiar with Ethernet infrastructure [40][42] Management's Comments on Operating Environment and Future Outlook - Management highlighted the critical role of infrastructure in determining the capabilities of data centers, emphasizing that the right networking solutions can transform standard compute engines into AI supercomputers [100][101] - The company anticipates continued innovation in networking technologies to support the growing demands of AI and distributed computing [100] Other Important Information - NVIDIA's acquisition of Mellanox has enhanced its capabilities in both Ethernet and InfiniBand technologies, allowing for a broader range of solutions tailored to customer needs [32][38] - The introduction of co-packaged silicon photonics is expected to improve optical network efficiency, reducing power consumption and increasing the number of GPUs that can be connected [84][85] Q&A Session Summary Question: What is the strategic importance of networking in AI data centers? - Networking is now seen as the defining element of data centers, crucial for connecting computing elements and determining efficiency and return on investment [8][9] Question: How does NVIDIA differentiate between scale-up and scale-out networking? - Scale-up networking focuses on creating larger compute engines, while scale-out networking connects multiple compute engines to support diverse workloads [15][16] Question: What are the advantages of NVLink over other networking solutions? - NVLink provides high bandwidth and low latency, essential for connecting GPUs in a dense configuration, making it superior for AI workloads [59][60] Question: How does the DPU enhance data center operations? - The DPU separates the data center operating system from application domains, improving security and efficiency in managing data center resources [54][56] Question: What is the future of optical networking in NVIDIA's infrastructure? - Co-packaged silicon photonics will enhance optical network efficiency, allowing for greater GPU connectivity while reducing power consumption [84][85]
英伟达InfiniBand,迎来新对手
半导体芯闻· 2025-06-10 09:52
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容 编译自 theregister 。 在英特尔将其Omni-Path互连技术剥离给Cornelis Networks五年后,这家公司终于准备好凭借其 400Gbps CN5000系列交换机和网卡(NIC)正面对抗老对手Nvidia的InfiniBand技术。 这一次,Cornelis的目标不仅限于超级计算机和高性能计算(HPC)集群,它还希望借助更高的性 价比切入AI热潮,与Nvidia展开竞争。 对于那些早已将Omni-Path抛诸脑后的读者,以下是快速回顾:Omni-Path最初由英特尔于2015年 开发,是一种无损互连技术,在许多方面与Nvidia的InfiniBand网络相似,主要面向高性能计算应 用。 第一代Omni-Path交换机提供4.8Tbps带宽,共有48个100Gbps端口,曾部署于多个超级计算平 台,如洛斯阿拉莫斯国家实验室的Trinity系统和美国能源部的Cori系统。 但 到 了 2019 年 , 英 特 尔 放 弃 了 该 项 目 , 并 于 2020 年 9 月 将 该 业 务 部 门 剥 离 成 立 Cornelis Netw ...
NVIDIA Powers Europe's Fastest Supercomputer
Globenewswire· 2025-06-10 09:00
Core Insights - NVIDIA announced the JUPITER supercomputer as the fastest in Europe, achieving over 2x speedup for high-performance computing and AI workloads compared to the next-fastest system [1][2] - JUPITER is expected to run at 1 quintillion FP64 operations per second, positioning it as Europe's first exascale supercomputer, facilitating advancements in various scientific fields [2][3] - The supercomputer is recognized for its energy efficiency, delivering 60 gigaflops per watt, and is built on Eviden's BullSequana XH3000 liquid-cooled architecture [3][5] Technological Advancements - JUPITER comprises nearly 24,000 NVIDIA GH200 Grace Hopper Superchips and is expected to reach over 90 exaflops of AI performance [3][4] - The system integrates NVIDIA's full software stack, enhancing performance across multiple applications, including climate modeling and quantum research [3][6] - It is designed to support hybrid quantum HPC-computation, utilizing tools like the NVIDIA CUDA-Q platform and cuQuantum SDK [5][11] Strategic Importance - The supercomputer is hosted by the Jülich Supercomputing Centre and owned by the EuroHPC Joint Undertaking, marking a significant step for European scientific and technological sovereignty [4][5] - JUPITER's capabilities are expected to catalyze foundational research in diverse fields such as climate modeling, energy systems, and biomedical innovation [5][6] - Early testing with the Linpack benchmark confirms JUPITER's performance, contributing to its ranking among the top five systems on the TOP500 list [5]
本周精华总结:英伟达业绩飙升背后:AI工厂构想与全球平台化进程
老徐抓AI趋势· 2025-06-06 09:34
欢迎大家 点击【预约】 按钮 本文重点 观点来自: 6 月 3 日本周二直播 【 强 烈建议直接看】 本段视频精华,逻辑更完整 文字版速览 一、财务表现与业绩结构拆解 预约 我 下一场直播 "NIMs"(NVIDIA Inference Microservices)成为关键转折点,标志着从大模型阶段迈向Agent阶段。未 来AI不仅用于聊天交互,而是具备感知、推理与执行能力的智能体,将在实际业务中承担更多职责。 NIM提供标准化、容器化AI微服务,加快部署效率,支撑企业级AI应用落地。 "AI工厂"的概念代表数据中心角色的根本性转变——未来不再只是处理计算任务,而是批量生成AI智能 体,作为企业的"虚拟员工"运行于各类场景中。这一趋势背后,是英伟达软硬一体化生态能力的系统体 现。 三、中美市场影响与全球交付逻辑 英伟达未正面详细回应出口管制影响,但表示正以极快速度向全球客户交付新产品,包括中国市场在 内。这说明公司正在调整产品结构,以合规方式保障中国业务的持续性。 英伟达2025财年第一季度营收达260亿美元,同比增长262%,环比增长18%;毛利率78.9%,同比提升 12.6个百分点,环比提升1.8个百分 ...