黄仁勋没有告诉我们的细节

Core Insights - The rapid advancement of AI models is accelerating, with improvements in the last six months surpassing those of the previous six months, driven by three overlapping expansion laws: pre-training expansion, post-training expansion, and inference time expansion [1][3]. Group 1: AI Model Developments - Claude 3.7 showcases remarkable performance in software engineering, while Deepseek v3 indicates a significant reduction in costs associated with the previous generation of models, promoting further adoption [3]. - OpenAI's o1 and o3 models demonstrate that longer inference times and searches yield better answers, suggesting that adding more computation post-training is virtually limitless [3]. - Nvidia aims to increase inference efficiency by 35 times to facilitate model training and deployment, emphasizing a shift in strategy from "buy more, save more" to "save more, buy more" [3][4]. Group 2: Market Concerns and Demand - There are concerns in the market regarding the rising costs due to software optimizations and hardware improvements driven by Nvidia, potentially leading to a decrease in demand for AI hardware and a symbolic oversupply situation [4]. - As the cost of intelligence decreases, net consumption is expected to increase, similar to the impact of fiber optics on internet connection costs [4]. - Current AI capabilities are limited by cost, but as inference costs decline, demand for intelligence is anticipated to grow exponentially [4]. Group 3: Nvidia's Roadmap and Innovations - Nvidia's roadmap includes the introduction of Blackwell Ultra B300, which will not be sold as a motherboard but as a GPU with enhanced performance and memory capacity [11][12]. - The B300 NVL16 will replace the B200 HGX form factor, featuring 16 packages and improved communication capabilities [12]. - The introduction of CX-8 NIC will double network speed compared to the previous generation, enhancing overall system performance [13]. Group 4: Jensen's Mathematical Rules - Jensen's new mathematical rules complicate the understanding of Nvidia's performance metrics, including how GPU counts are calculated based on chip numbers rather than package counts [6][7]. - The first two rules involve representing Nvidia's overall FLOP performance and bandwidth in a more complex manner, impacting how specifications are interpreted [6]. Group 5: Future Architecture and Performance - The Rubin architecture is expected to deliver over 50 PFLOPs of dense FP4 computing power, significantly enhancing performance compared to previous generations [16]. - Nvidia's focus on larger tensor core arrays in each generation aims to improve data reuse and reduce control complexity, although programming challenges remain [18]. - The introduction of the Kyber rack architecture aims to increase density and scalability, allowing for a more efficient deployment of GPU resources [27][28]. Group 6: Inference Stack and Dynamo - Nvidia's new inference stack and Dynamo aim to enhance throughput and interactivity in AI applications, with features like intelligent routing and GPU scheduling to optimize resource utilization [39][40]. - The improvements in the NCCL collective inference library are expected to reduce latency and enhance overall throughput for smaller message sizes [44]. - The NVMe KV-Cache unload manager will improve efficiency in pre-filling operations by retaining previous conversation data, thus reducing the need for recalculation [48][49]. Group 7: Cost Reduction and Competitive Edge - Nvidia's advancements are projected to significantly lower the total cost of ownership for AI systems, with predictions of rental price declines for H100 chips starting in mid-2024 [55]. - The introduction of co-packaged optics (CPO) solutions is expected to reduce power consumption and enhance network efficiency, allowing for larger-scale deployments [57][58]. - Nvidia continues to lead the market with innovative technologies, maintaining a competitive edge over rivals by consistently advancing its architecture and algorithms [61].