Workflow
AI训练和推理
icon
Search documents
英伟达员工指微软数据中心冷却系统浪费资源
Xin Lang Ke Ji· 2025-12-12 11:22
【#英伟达员工吐槽微软冷却系统太浪费#】英伟达正在为微软的数据中心提供最新一代 Blackwell 芯 片,而在今年初秋的部署过程中,英伟达一名员工注意到,微软其中一处设施的冷却方式显得过于浪 费。 随着 AI 模型训练与推理的算力需求迅猛攀升,英伟达正为微软等科技巨头大量部署 GB200 Blackwell 系统。 今年初秋时,英伟达基础设施专家团队的员工在内部邮件中,详细描述了为 OpenAI 机群部署 Blackwell 机架的现场情况。作为 OpenAI 的云合作伙伴与最大投资方,此类部署由微软负责。 《商业内幕》今天援引的英伟达内部邮件提到,此次安装包括两组 GB200 NVL72 机架,每组搭载 72 颗英伟达 GPU。由于如此高密度的 GPU 阵列会产生巨量热能,微软采用液冷技术将热量从服务器周围 迅速带走。 但邮件也指出,微软在建筑层面的整体冷却方式因规模过大且未使用设施级冷却用水,看起来造成资源 浪费,但确实带来了良好的弹性与故障容忍能力。 美国加州大学的电气与计算机工程副教授任绍雷(音译)解释说,数据中心通常采用"双层冷却结构": 服务器内部用液冷,而建筑本体需要另一套系统把整体热量排到室 ...
RTX5090目前的市场行情
傅里叶的猫· 2025-06-08 12:28
Core Viewpoint - The article discusses the current market situation of the NVIDIA RTX 5090 graphics card, focusing on its price, rental market, computing power, power consumption, performance, heat generation, and networking capabilities since its release in January 2025 [1]. Pricing - The initial expected price of the RTX 5090 was over 40,000 yuan, but it has dropped to just over 20,000 yuan within four months, with some brands listed as low as 23,000 yuan on platforms like JD.com. This price decline is attributed to concerns over chip overheating, rumors of performance bottlenecks in multi-card setups, initial high pricing by manufacturers, and the competitive appeal of the previous generation RTX 4090 [2]. Rental Market - The high initial price of the RTX 5090 (over 30,000 yuan) led to slow development in the rental market. It wasn't until May, when prices fell, that some data centers began to offer RTX 5090 models for rent. Currently, the investment payback period for an 8-card machine is approximately four years, which may be too long for AI companies given the rapidly changing demand for computing power [3][6]. Computing Power - The RTX 5090 excels in computing power, particularly in AI training and inference scenarios, with a single card achieving 419 TFLOPS and an 8-card machine reaching about 3.4 PFLOPS. A cluster of 300 RTX 5090 cards can form a computing cluster capable of trillions of floating-point operations, making it advantageous for large language model training and high-performance computing tasks [4]. Power Consumption - The RTX 5090 has a rated power consumption of 575W, with peak consumption reaching up to 900W. An 8-card machine consumes approximately 6kW, leading to monthly electricity costs of around 3,600 yuan based on a rate of 0.6 yuan per kWh. This high power consumption increases operational costs and necessitates robust cooling and power supply systems [7]. Performance - In AI inference scenarios, the RTX 5090 supports low-precision calculations (FP8 and FP4), significantly enhancing efficiency. It shows about a 50% faster inference speed compared to the previous generation RTX 4090. In gaming, it outperforms the 4090 at 4K resolution, but optimal performance requires targeted optimization, especially in low-precision inference [8]. Heat Generation - The RTX 5090 faces heat issues primarily related to the chip and power connectors, particularly the 12V-2x6 connectors. Although such overheating incidents are rare, they require attention. Solutions include limiting peak power through driver or BIOS settings, using liquid cooling or turbo fans, and employing original power cables to avoid compatibility issues [9][10]. Networking - Initial concerns about potential "lock card" issues or performance bottlenecks in multi-card setups have not been substantiated in practical tests. Actual tests showed no such problems, and many companies using the RTX 5090 reported stable performance in NVLink and PCIe networking, making it suitable for building high-performance AI clusters [11].