Workflow
GB300 v6虚拟机
icon
Search documents
刚刚,全球首个GB300巨兽救场,一年烧光70亿,OpenAI内斗GPU惨烈
3 6 Ke· 2025-10-11 11:27
Core Insights - OpenAI is facing intense internal competition for GPU resources, with a total investment of $7 billion in computing power for 2024, primarily for large model development and inference computing [1][2][12] - Microsoft has launched the world's first GB300 supercomputer, specifically designed for OpenAI, which can significantly reduce the training time for trillion-parameter models from weeks to days [4][6][10] Group 1: Investment and Resource Allocation - OpenAI has spent $5 billion on large model research and $2 billion on inference computing over the past year [1] - The demand for computing power is described as an "endless pit," leading to a critical need for supercomputing expansion and partnerships [2][21] - OpenAI's leadership team has established a clear resource allocation mechanism to manage GPU distribution between research and application teams [15][19] Group 2: Supercomputer Specifications - The GB300 supercomputer features over 4,600 GB300 NVL72 GPUs interconnected via the next-generation InfiniBand network, enabling high data transfer rates and memory capacity [6][8][10] - The system is designed for large-scale AI supercomputing, with a rack-level design that includes 72 GPUs per rack and a total of 37TB of high-speed memory [7][10] - The architecture supports a performance of up to 1,440 PFLOPS using FP4 Tensor Core technology, enhancing the capabilities for AI applications [10] Group 3: Internal Competition and Challenges - OpenAI's internal GPU allocation process is described as a "painful and exhausting" experience, with teams competing fiercely for limited resources [2][12][13] - The allocation of GPUs is critical for productivity, as the number of GPUs directly influences the capabilities of AI applications [19][21] - OpenAI's Chief Product Officer has emphasized the immediate utilization of newly acquired GPUs, highlighting the urgency of resource allocation [21]