Scale Up
Search documents
回归技术--Scale Up割裂的生态
傅里叶的猫· 2025-10-18 16:01
Core Viewpoint - The article discusses the comparison of Scale Up solutions in AI servers, focusing on the UALink technology promoted by Marvell and the current mainstream Scale Up approaches in the international market [1][3]. Comparison of Scale Up Solutions - Scale Up refers to high-speed communication networks between GPUs within the same server or rack, allowing them to operate collaboratively as a large supercomputer [3]. - The market for Scale Up networks is projected to reach $4 billion in 2024, with a compound annual growth rate (CAGR) of 34%, potentially growing to $17 billion by 2029 [5][7]. Key Players and Technologies - NVIDIA's NVLink technology is currently dominant in the Scale Up market, enabling GPU interconnection and communication within server configurations [11][12]. - AMD is developing UALink, which is based on its Infinity Fabric technology, and aims to transition to a complete UALink solution once native switches are available [12][17]. - Google utilizes inter-chip interconnect (ICI) technology for TPU Scale Up, while Amazon employs NeuronLink for its Trainium chips [13][14]. Challenges in the Ecosystem - The current ecosystem for Scale Up solutions is fragmented, with various proprietary technologies leading to compatibility issues among different manufacturers [10][22]. - Domestic GPU manufacturers face challenges in developing their own interconnect protocols due to system complexity and resource constraints [9]. Future Trends - The article suggests that as the market matures, there will be a shift from proprietary Scale Up networks to open solutions like UAL and SUE, which are expected to gain traction by 2027-2028 [22]. - The choice between copper and optical connections for Scale Up networks is influenced by cost and performance, with copper currently being the preferred option for short distances [20][21].
More and more of our innovations are being adopted, at higher and higher value, says Corning CEO
Youtube· 2025-09-12 23:41
Core Viewpoint - Corning's business is thriving, particularly in the specialty glass sector, with a notable 84% increase in stock value over the past year, driven by demand in data centers and mobile consumer electronics [2][15][16] Company Performance - Corning is recognized for its specialty glass products, including fiber optic cables essential for data centers, contributing to its booming business [2][4] - The company has experienced significant growth, particularly in its data center segment, which is currently its fastest-growing business [4][11] - Corning's stock has risen by 84% in the last 12 months, reflecting strong market performance [2] Data Center Innovations - The data center segment is expected to grow further as the industry shifts from copper to optical fiber, which is more efficient for connecting AI clusters [4][6] - A notable example is Meta's Louisiana campus, which requires 8 million miles of fiber, enough to circle the Earth 320 times, highlighting the scale of fiber demand [7][8] - The transition to glass in data centers could lead to lower energy consumption and costs, although significant innovation is still needed [10][12] New Business Ventures - Corning is expanding into solar energy with plans to establish a large American-made ingot wafer plant in Michigan, which could triple its current solar business run rate [14] - The solar sector is becoming increasingly competitive, with government policies influencing energy costs [15] Resilience Against Tariffs - Corning's business has shown resilience against tariffs, with 90% of its US revenue generated from domestically produced products, minimizing the impact of international trade policies [16]
让64张卡像一张卡!浪潮信息发布新一代AI超节点,支持四大国产开源模型同时运行
量子位· 2025-08-11 07:48
Core Viewpoint - The article highlights the advancements in domestic open-source AI models, emphasizing their performance improvements and the challenges posed by the increasing demand for computational resources and low-latency communication in the era of Agentic AI [1][2][13]. Group 1: Model Performance and Infrastructure - Domestic open-source models like DeepSeek R1 and Kimi K2 are achieving significant milestones in inference capabilities and handling long texts, with parameter counts exceeding trillions [1]. - The emergence of Agentic AI necessitates multi-model collaboration and complex reasoning chains, leading to explosive growth in computational and communication demands [2][15]. - Inspur's "Yuan Nao SD200" super-node AI server is designed to support trillion-parameter models and facilitate real-time collaboration among multiple agents [3][5]. Group 2: Technical Specifications of Yuan Nao SD200 - Yuan Nao SD200 integrates 64 GPUs into a unified memory and addressing super-node, redefining the boundaries of "machine domain" beyond multiple hosts [7]. - The architecture employs a 3D Mesh design and proprietary Open Fabric Switch technology, allowing for high-speed interconnectivity among GPUs across different hosts [8][19]. - The system achieves ultra-low latency communication, with end-to-end delays outperforming mainstream solutions, crucial for inference scenarios involving small data packets [8][12]. Group 3: System Optimization and Compatibility - Yuan Nao SD200 features Smart Fabric Manager for global optimal routing based on load characteristics, minimizing communication costs [9]. - The system supports major computing frameworks like PyTorch, enabling quick migration of existing models without extensive code rewriting [11][32]. - Performance tests show that the system achieves approximately 3.7 times super-linear scaling for DeepSeek R1 and 1.7 times for Kimi K2 during full-parameter inference [11]. Group 4: Open Architecture and Industry Strategy - Yuan Nao SD200 is built on an open architecture, promoting collaboration among various hardware vendors and providing users with diverse computing options [25][30]. - The OCM and OAM standards facilitate compatibility and low-latency connections among different AI accelerators, enhancing the system's performance for large model training and inference [26][29]. - The strategic choice of an open architecture aims to lower migration costs and enable more enterprises to access advanced AI technologies, promoting "intelligent equity" [31][33].
超节点,凭何成为AI算力“新宠”
2 1 Shi Ji Jing Ji Bao Dao· 2025-07-31 00:38
Core Insights - The rapid development of large models driven by the AI wave has created stringent demands for computing power, leading to the emergence of the "SuperPod" as a key solution in the industry [1][2] - The transition from traditional computing architectures to SuperPod technology signifies a shift towards high-performance, low-cost, and energy-efficient AI training solutions [1][2] Industry Trends - The SuperPod, proposed by NVIDIA, represents the optimal solution for Scale Up architecture, integrating GPU resources to create a low-latency, high-bandwidth computing entity [2] - The traditional air-cooled AI servers are reaching their power density limits, prompting the adoption of advanced cooling technologies like liquid cooling in SuperPod designs [2][5] - The market outlook for SuperPods is optimistic, with many domestic and international server manufacturers adopting this next-generation solution [2][4] Technological Developments - Current mainstream SuperPod solutions include private protocol schemes (e.g., NVIDIA, Trainium, Huawei) and open organization schemes, with copper connections becoming increasingly prevalent for internal communications [3][4] - The ETH-X open SuperPod project, led by the Open Data Center Committee, exemplifies the integration of Scale Up and Scale Out networking strategies [4] Company Initiatives - Chinese tech companies are actively investing in the SuperPod space, with Huawei showcasing its Ascend 384 SuperPod, which features the largest scale of 384-card high-speed bus interconnection [5] - Other companies like Xizhi Technology and Muxi have introduced innovative solutions, such as distributed optical interconnects and liquid-cooled GPU modules, enhancing the SuperPod technology landscape [5][6] - Moore Threads has established a comprehensive AI computing product line, aiming to create a new generation of AI training infrastructure, referred to as a "super factory" for advanced model production [6]