AI Computing

Search documents
云端算力竞赛新突破:创纪录英伟达GB200参与MLPerf测试,性能提升超两倍
硬AI· 2025-06-05 10:32
Core Viewpoint - The collaboration between CoreWeave, NVIDIA, and IBM has resulted in the largest MLPerf Training v5.0 test, showcasing significant advancements in AI infrastructure capabilities [2][3]. Group 1: MLPerf Training v5.0 Test Results - CoreWeave utilized 2,496 GB200 Grace Blackwell chips to create the largest NVIDIA GB200 NVL72 cluster in MLPerf history, surpassing previous submissions by 34 times [2][3]. - The GB200 NVL72 cluster completed the training of the Llama 3.1 405B model in just 27.3 minutes, achieving over a twofold performance improvement compared to similar scale clusters [3]. - This performance leap highlights the capabilities of the GB200 NVL72 architecture and CoreWeave's infrastructure in handling demanding AI workloads [3]. Group 2: Industry Participation and Growth - The MLPerf Training v5.0 test saw a record number of submissions, with 201 performance test results from 20 different organizations, indicating a significant increase in industry participation [6]. - The introduction of the Llama 3.1 405B model as the largest model in the training suite has attracted more submissions than previous tests based on GPT-3, reflecting the growing importance of large-scale training [5][6]. - New participants in the MLPerf Training tests include AMD, IBM, and others, emphasizing the expanding landscape of AI infrastructure providers [6].
鲲鹏、昇腾加快打造计算产业生态
Zhong Guo Xin Wen Wang· 2025-05-26 02:34
Core Insights - The article highlights the development of the Kunpeng and Ascend computing technologies by Huawei, which aim to create an innovative ecosystem for global developers [1][2] - The focus is on enhancing the usability, flexibility, and scalability of AI platforms and tools to accelerate industry application innovation [1] Group 1: Ecosystem Development - As of May 2025, Kunpeng and Ascend have developed over 6.65 million developers and more than 8,800 partners, with over 23,900 solution certifications completed [1] - The ecosystem is deemed crucial for the breakthrough of domestic computing technologies [1] Group 2: Technological Innovations - Huawei introduced the Kunpeng AI+ solution for general computing and the CATLASS operator template library for AI computing, enhancing the ease of operator development and application deployment [1] - The Ascend super-node architecture aims to overcome cluster interconnection bottlenecks, significantly improving overall computing efficiency [1] Group 3: Talent Development and Collaboration - Huawei has trained over 400,000 students in Ascend technology and established excellence and incubation centers with top universities [2] - Collaboration with iFlytek on large-scale model training has addressed multiple technical challenges, enabling the application of MoE models [2] - The DeepSeek model, developed in partnership with over 100 collaborators, supports more than 500 clients across various industries including internet, telecommunications, finance, education, and healthcare [2]
全球首个!“英伟达亲儿子”CoreWeave大规模上线GB200服务器
硬AI· 2025-04-16 09:52
点击 上方 硬AI 关注我们 测试结果显示,相比前代英伟达Hopper GPU,GB200 NVL72服务器能帮助Cohere在1000亿参数模型的训练实现高达3 倍的性能提升,此外,IBM和Mistral AI也已成为CoreWeave GB200云服务的首批用户。 "世界各地的企业和组织正在竞相将推理模型转化为代理型人工智能应用,这将改变人们的工作和娱 乐方式。" 硬·AI 作者 | 李笑寅 编辑 | 硬 AI CoreWeave再度抢占先机,率先部署英伟达GB200系统,AI巨头争相入局。 英伟达今日在其博客上宣布, AI云计算提供商CoreWeave已成为首批大规模部署英伟达GB200 NVL72 系统的云服务提供商之一。Cohere、IBM和Mistral AI已成为首批用户。 根据最新MLPerf基准测试,这些系统提供了前代H100芯片2-3倍的性能提升,将显著加速大模型训练和推 理能力。 CoreWeave首席执行官Michael Intrator表示,这一成就既展示了公司的工程实力和执行速度,也体现了 其对下一代AI发展的专注: "CoreWeave的设计就是为了更快速地行动——我们一次又一次 ...