开放互连
Search documents
从芯粒到机柜:聊聊大模型浪潮下的开放互连
半导体行业观察· 2025-12-02 01:37
Core Insights - The article emphasizes the importance of open interconnect standards like UCIe, CXL, UAL, and UEC in the AI infrastructure landscape, highlighting their roles in enhancing hardware ecosystems and addressing the challenges posed by large model training and inference [2][10]. Group 1: Background and Evolution - The establishment of the CXL Alliance in March 2019 aimed to tackle challenges related to heterogeneous XPU programming and memory bandwidth expansion, with Alibaba being a founding member [4]. - The UCIe Alliance was formed in March 2022 to create an open Die-to-Die interconnect standard, with Alibaba as the only board member from mainland China [4]. - The UEC Alliance was established in July 2023 to address the inefficiencies of traditional Ethernet in AI and HPC environments, with Alibaba joining as a General member [4]. - The UAL Alliance was formed in October 2024 to meet the growing demands for Scale-up networks due to increasing model sizes and inference contexts, with Alibaba also joining as a board member [4]. Group 2: Scaling Laws in AI Models - The article outlines three phases of scaling laws: Pre-training Scaling, Post-training Scaling, and Test-time Scaling, with a shift in focus towards Test-time Scaling as models transition from development to application [5][8]. - Test-time Scaling introduces new challenges for AI infrastructure, particularly regarding latency and throughput requirements [8]. Group 3: UCIe and Chiplet Design - UCIe is positioned as a critical standard for chiplet interconnects, addressing cost, performance, yield, and process node optimization in chip design [10][11]. - The article discusses the advantages of chiplet-based designs, including improved yield, process node optimization, cross-product reuse, and market scalability [14][15][17]. - UCIe's protocol stack is designed to meet the specific needs of chiplet interconnects, including low latency, high bandwidth density, and support for various packaging technologies [18][19][21]. Group 4: CXL and Server Architecture - CXL aims to redefine server architectures by enabling memory pooling and extending host memory capacity through CXL memory modules [29][34]. - Key features of CXL include memory pooling, unified memory space, and host-to-host communication capabilities, which enhance AI infrastructure efficiency [30][35]. - The article highlights the challenges CXL faces, such as latency issues due to PCIe PHY limitations and the complexity of implementing CXL.cache [34][35]. Group 5: UAL and Scale-Up Networks - UAL is designed to support Scale-Up networks, allowing for efficient memory semantics and reduced protocol overhead [37][43]. - The UAL protocol stack includes layers for protocol, transaction, data link, and physical layers, facilitating high-speed communication and memory operations [43][45]. - UAL's architecture aims to provide a unified memory space across multiple nodes, addressing the unique communication needs of large AI models [50][51].