Nvidia Dynamo

Search documents
黄仁勋称,今年GTC是“AI超级碗”,但人人都能赢
汽车商业评论· 2025-03-19 15:46
撰 文 / 钱亚光 设 计 / 赵昊然 此次GTC大会上,黄仁勋继续表达对算力需求增长前景的看好。虽然大型语言模型能提供基础知 识,但推理模型能给出更复杂、更具分析性的回答。黄仁勋表示,借助该公司新推出的开源软件 Nvidia Dynamo和Blackwell芯片,将使DeepSeek R1的运行速度提高30倍。 他在主题演讲中,强调了英伟达系统所支持的人工智能应用的广度。他详细阐述了英伟达在自动驾 驶汽车、更优无线网络和先进机器人技术开发方面的贡献,并公布了公司未来两年的产品路线图。 他说,来自四大云服务提供商对GPU的需求正在飙升,并补充说,他预计英伟达的数据中心基础设 施收入到2028年将达到1万亿美元。 3月19日晚间,身着标志性的黑色皮装的英伟达首席执行官黄仁勋(Jensen Huang)在英伟达GTC大 会上占据了中心位置。 此次活动吸引了超过25000人来到美国加州圣何塞SAP中心,黄仁勋在主题演讲开始时向观众抛出 印有"AI 超级碗大赛"字样的T恤,并宣布今年的GTC(全球人工智能大会)为"AI 超级碗"大赛。 "去年我们在这里办GTC,被描述为'AI的摇滚音乐节'(AI Woodstock ...
黄仁勋没有告诉我们的细节
半导体芯闻· 2025-03-19 10:34
Core Insights - The rapid advancement of AI models is accelerating, with improvements in the last six months surpassing those of the previous six months, driven by three overlapping expansion laws: pre-training expansion, post-training expansion, and inference time expansion [1][3]. Group 1: AI Model Developments - Claude 3.7 showcases remarkable performance in software engineering, while Deepseek v3 indicates a significant reduction in costs associated with the previous generation of models, promoting further adoption [3]. - OpenAI's o1 and o3 models demonstrate that longer inference times and searches yield better answers, suggesting that adding more computation post-training is virtually limitless [3]. - Nvidia aims to increase inference efficiency by 35 times to facilitate model training and deployment, emphasizing a shift in strategy from "buy more, save more" to "save more, buy more" [3][4]. Group 2: Market Concerns and Demand - There are concerns in the market regarding the rising costs due to software optimizations and hardware improvements driven by Nvidia, potentially leading to a decrease in demand for AI hardware and a symbolic oversupply situation [4]. - As the cost of intelligence decreases, net consumption is expected to increase, similar to the impact of fiber optics on internet connection costs [4]. - Current AI capabilities are limited by cost, but as inference costs decline, demand for intelligence is anticipated to grow exponentially [4]. Group 3: Nvidia's Roadmap and Innovations - Nvidia's roadmap includes the introduction of Blackwell Ultra B300, which will not be sold as a motherboard but as a GPU with enhanced performance and memory capacity [11][12]. - The B300 NVL16 will replace the B200 HGX form factor, featuring 16 packages and improved communication capabilities [12]. - The introduction of CX-8 NIC will double network speed compared to the previous generation, enhancing overall system performance [13]. Group 4: Jensen's Mathematical Rules - Jensen's new mathematical rules complicate the understanding of Nvidia's performance metrics, including how GPU counts are calculated based on chip numbers rather than package counts [6][7]. - The first two rules involve representing Nvidia's overall FLOP performance and bandwidth in a more complex manner, impacting how specifications are interpreted [6]. Group 5: Future Architecture and Performance - The Rubin architecture is expected to deliver over 50 PFLOPs of dense FP4 computing power, significantly enhancing performance compared to previous generations [16]. - Nvidia's focus on larger tensor core arrays in each generation aims to improve data reuse and reduce control complexity, although programming challenges remain [18]. - The introduction of the Kyber rack architecture aims to increase density and scalability, allowing for a more efficient deployment of GPU resources [27][28]. Group 6: Inference Stack and Dynamo - Nvidia's new inference stack and Dynamo aim to enhance throughput and interactivity in AI applications, with features like intelligent routing and GPU scheduling to optimize resource utilization [39][40]. - The improvements in the NCCL collective inference library are expected to reduce latency and enhance overall throughput for smaller message sizes [44]. - The NVMe KV-Cache unload manager will improve efficiency in pre-filling operations by retaining previous conversation data, thus reducing the need for recalculation [48][49]. Group 7: Cost Reduction and Competitive Edge - Nvidia's advancements are projected to significantly lower the total cost of ownership for AI systems, with predictions of rental price declines for H100 chips starting in mid-2024 [55]. - The introduction of co-packaged optics (CPO) solutions is expected to reduce power consumption and enhance network efficiency, allowing for larger-scale deployments [57][58]. - Nvidia continues to lead the market with innovative technologies, maintaining a competitive edge over rivals by consistently advancing its architecture and algorithms [61].