GB200 Grace Blackwell超级芯片

Search documents
云端算力竞赛新突破:创纪录英伟达GB200参与MLPerf测试,性能提升超两倍
硬AI· 2025-06-05 10:32
Core Viewpoint - The collaboration between CoreWeave, NVIDIA, and IBM has resulted in the largest MLPerf Training v5.0 test, showcasing significant advancements in AI infrastructure capabilities [2][3]. Group 1: MLPerf Training v5.0 Test Results - CoreWeave utilized 2,496 GB200 Grace Blackwell chips to create the largest NVIDIA GB200 NVL72 cluster in MLPerf history, surpassing previous submissions by 34 times [2][3]. - The GB200 NVL72 cluster completed the training of the Llama 3.1 405B model in just 27.3 minutes, achieving over a twofold performance improvement compared to similar scale clusters [3]. - This performance leap highlights the capabilities of the GB200 NVL72 architecture and CoreWeave's infrastructure in handling demanding AI workloads [3]. Group 2: Industry Participation and Growth - The MLPerf Training v5.0 test saw a record number of submissions, with 201 performance test results from 20 different organizations, indicating a significant increase in industry participation [6]. - The introduction of the Llama 3.1 405B model as the largest model in the training suite has attracted more submissions than previous tests based on GPT-3, reflecting the growing importance of large-scale training [5][6]. - New participants in the MLPerf Training tests include AMD, IBM, and others, emphasizing the expanding landscape of AI infrastructure providers [6].
一文读懂老黄ComputeX演讲:这不是产品发布,这是“AI工业革命动员令”
华尔街见闻· 2025-05-19 13:50
Core Viewpoint - Nvidia is transitioning from a technology company to an AI infrastructure company, marking the beginning of a new era of AI factories that utilize energy as input and produce tokens as output, representing a third infrastructure revolution following electricity and the internet [3][2]. Group 1: AI Infrastructure and Chip Platforms - Nvidia introduced the Grace Blackwell GB200 chip and NVLink architecture, which features a core interconnect module with a bandwidth of 130 TB/s, surpassing the entire internet's data throughput [4]. - The GB200 chip integrates 72 GPUs and is designed to perform at the level of the 2018 Sierra supercomputer [4]. - Nvidia plans to launch the GB300 chip in Q3, which will enhance inference performance by 1.5 times, increase HBM memory by 1.5 times, and double network bandwidth while maintaining physical compatibility with the previous generation [6]. Group 2: NVLink Fusion and Ecosystem - The NVLink Fusion architecture allows seamless integration of CPUs/ASICs/TPUs from other manufacturers with Nvidia GPUs, enabling a "semi-custom infrastructure" [8]. - This technology addresses communication speed issues between GPUs and CPUs in AI servers, providing up to 14 times the bandwidth compared to standard PCIe interfaces, thus enhancing scalability and energy efficiency [10]. Group 3: Personal Supercomputing and Enterprise AI - The DGX Spark personal AI computer is set to launch soon, targeting AI researchers who wish to own their supercomputers [12]. - Nvidia's RTX Pro enterprise AI server supports traditional IT workloads and can run graphical AI agents, indicating a shift towards integrating AI into the workforce [14]. Group 4: Robotics and AI Applications - Nvidia is developing robotic systems alongside the automotive industry, utilizing the Isaac Groot platform powered by the Jetson Thor processor, aimed at applications from autonomous vehicles to human-machine systems [18]. - The company believes that robotics will become a trillion-dollar industry, emphasizing the need for scalable solutions [22]. - Nvidia is collaborating with DeepMind and Disney Research to develop the advanced physics engine Newton, which will be open-sourced in July [23].
直击黄仁勋Computex演讲:将于三季度推出下一代GB300系统、个人AI计算机DGX Spark已全面投产
Hua Er Jie Jian Wen· 2025-05-19 04:18
Group 1 - Huang Renxun, founder and CEO of Nvidia, delivered a keynote speech at Computex 2025, focusing on the latest breakthroughs in AI and accelerated computing technology [1] - Computex 2025 is themed "AI Next," featuring three main topics: "Smart Computing & Robotics," "Next-Generation Technology," and "Future Mobility," with nearly 1,400 exhibitors and an exhibition area of 80,000 square meters [1] - Nvidia is expected to unveil a new CPU for the first time and provide updates on the RTX 50 series during the event [1] Group 2 - The next-generation supercomputer features core technology achieving a full interconnect bandwidth of 14.4TB per second, utilizing NVLink Spine and custom blind-mate backplanes to reach an impressive 130TB bandwidth [5] - The system integrates 72 Blackwell processors or 144 GPU chips, connected to form a vast GPU system with a total of 1.3 trillion transistors [5] - Nvidia is constructing a giant AI supercomputer for Taiwan's AI infrastructure and ecosystem, with the Grace Blackwell system now fully in production [5] Group 3 - The company plans to upgrade to the Grace Blackwell GB300 version this quarter, which features enhanced Blackwell chips with 1.5 times improved inference performance and doubled network connectivity [6] - Nvidia's AI computing power has increased approximately 1 million times every decade, facilitated by a new manufacturing process called COOS-L developed in collaboration with TSMC [9] - The CUDA library is designed to integrate AI into 5G and even 6G networks, enabling advanced computing capabilities [11] Group 4 - Huang Renxun emphasized that physical AI is foundational for the robotics revolution, requiring significant computing power for training, leading to the introduction of Grace Blackwell, a thinking machine [13] - The CUDA-X library accelerates computing beyond GPUs, with applications in weather analysis and deep learning [15] - A new MSI laptop featuring the RTX 5060 GPU is set to launch in May, with Huang confirming its release [17]
全球首个!“英伟达亲儿子”CoreWeave大规模上线GB200服务器
硬AI· 2025-04-16 09:52
点击 上方 硬AI 关注我们 测试结果显示,相比前代英伟达Hopper GPU,GB200 NVL72服务器能帮助Cohere在1000亿参数模型的训练实现高达3 倍的性能提升,此外,IBM和Mistral AI也已成为CoreWeave GB200云服务的首批用户。 "世界各地的企业和组织正在竞相将推理模型转化为代理型人工智能应用,这将改变人们的工作和娱 乐方式。" 硬·AI 作者 | 李笑寅 编辑 | 硬 AI CoreWeave再度抢占先机,率先部署英伟达GB200系统,AI巨头争相入局。 英伟达今日在其博客上宣布, AI云计算提供商CoreWeave已成为首批大规模部署英伟达GB200 NVL72 系统的云服务提供商之一。Cohere、IBM和Mistral AI已成为首批用户。 根据最新MLPerf基准测试,这些系统提供了前代H100芯片2-3倍的性能提升,将显著加速大模型训练和推 理能力。 CoreWeave首席执行官Michael Intrator表示,这一成就既展示了公司的工程实力和执行速度,也体现了 其对下一代AI发展的专注: "CoreWeave的设计就是为了更快速地行动——我们一次又一次 ...