Workflow
AI训练和推理
icon
Search documents
英伟达员工指微软数据中心冷却系统浪费资源
Xin Lang Ke Ji· 2025-12-12 11:22
Core Viewpoint - Nvidia is providing its latest generation Blackwell chips to Microsoft for data centers, but there are concerns about the efficiency of Microsoft's cooling systems, which may be wasteful despite offering good resilience and fault tolerance [1]. Group 1: Nvidia and Microsoft Collaboration - Nvidia is deploying GB200 Blackwell systems for Microsoft, which is a major partner and investor in OpenAI [1]. - The installation includes two sets of GB200 NVL72 racks, each equipped with 72 Nvidia GPUs, highlighting the high-density GPU array's significant heat generation [1]. Group 2: Cooling Systems and Efficiency - The cooling method used by Microsoft involves liquid cooling for the servers, but the overall building cooling system appears to be inefficient due to its large scale and reliance on air cooling instead of water cooling [2]. - Air cooling consumes more energy but does not use water, which can raise public concerns about water resource management [2]. Group 3: Performance and Infrastructure - The Fairwater data center, consisting of interconnected Nvidia GB200 clusters, is designed to deliver ten times the performance of the fastest supercomputer currently available, enabling unprecedented levels of AI training and inference workloads [3]. - Fairwater employs a liquid-cooled closed-loop system that requires no water for operations after construction and matches all energy consumption with renewable sources [4][5]. Group 4: Expansion and Community Engagement - Fairwater is one of several similar sites being developed across over 70 regions, with multiple identical data centers under construction in the US, supporting AI infrastructure in more than 100 data centers globally [6][7]. - The company aims to integrate compute, network, and storage into a highly scaled cluster while designing closed-loop energy systems to meet real-world computing needs, and is committed to sustainable practices that create jobs and expand opportunities in local communities [8].
RTX5090目前的市场行情
傅里叶的猫· 2025-06-08 12:28
Core Viewpoint - The article discusses the current market situation of the NVIDIA RTX 5090 graphics card, focusing on its price, rental market, computing power, power consumption, performance, heat generation, and networking capabilities since its release in January 2025 [1]. Pricing - The initial expected price of the RTX 5090 was over 40,000 yuan, but it has dropped to just over 20,000 yuan within four months, with some brands listed as low as 23,000 yuan on platforms like JD.com. This price decline is attributed to concerns over chip overheating, rumors of performance bottlenecks in multi-card setups, initial high pricing by manufacturers, and the competitive appeal of the previous generation RTX 4090 [2]. Rental Market - The high initial price of the RTX 5090 (over 30,000 yuan) led to slow development in the rental market. It wasn't until May, when prices fell, that some data centers began to offer RTX 5090 models for rent. Currently, the investment payback period for an 8-card machine is approximately four years, which may be too long for AI companies given the rapidly changing demand for computing power [3][6]. Computing Power - The RTX 5090 excels in computing power, particularly in AI training and inference scenarios, with a single card achieving 419 TFLOPS and an 8-card machine reaching about 3.4 PFLOPS. A cluster of 300 RTX 5090 cards can form a computing cluster capable of trillions of floating-point operations, making it advantageous for large language model training and high-performance computing tasks [4]. Power Consumption - The RTX 5090 has a rated power consumption of 575W, with peak consumption reaching up to 900W. An 8-card machine consumes approximately 6kW, leading to monthly electricity costs of around 3,600 yuan based on a rate of 0.6 yuan per kWh. This high power consumption increases operational costs and necessitates robust cooling and power supply systems [7]. Performance - In AI inference scenarios, the RTX 5090 supports low-precision calculations (FP8 and FP4), significantly enhancing efficiency. It shows about a 50% faster inference speed compared to the previous generation RTX 4090. In gaming, it outperforms the 4090 at 4K resolution, but optimal performance requires targeted optimization, especially in low-precision inference [8]. Heat Generation - The RTX 5090 faces heat issues primarily related to the chip and power connectors, particularly the 12V-2x6 connectors. Although such overheating incidents are rare, they require attention. Solutions include limiting peak power through driver or BIOS settings, using liquid cooling or turbo fans, and employing original power cables to avoid compatibility issues [9][10]. Networking - Initial concerns about potential "lock card" issues or performance bottlenecks in multi-card setups have not been substantiated in practical tests. Actual tests showed no such problems, and many companies using the RTX 5090 reported stable performance in NVLink and PCIe networking, making it suitable for building high-performance AI clusters [11].