Workflow
GB200 NVL72
icon
Search documents
老黄开年演讲「含华量」爆表,直接拿DeepSeek、Kimi验货下一代芯片
3 6 Ke· 2026-01-07 01:35
CES巨幕上,老黄的PPT已成中国AI的「封神榜」。DeepSeek与Kimi位列C位之时,算力新时代已至。 万众瞩目的2026 CES科技盛宴上,一张PPT瞬间燃爆AI圈。 老黄主旨演讲上,中国大模型Kimi K2、DeepSeek V3.2,以及Qwen赫然上屏,位列全球开源大模型前列,性能正在逼近闭源模型。 这一刻,是属于中国AI的高光时刻。 另外,OpenAI的GPT-OSS和老黄自家的Nemotron,也做了标注。 而且,DeepSeek-R1、Qwen3 和 Kimi K2 代表着MoE路线下顶级规模的尝试,仅需激活少量参数,大幅减少计算量和HBM显存带宽的压力。 在下一代Rubin架构亮相的核心环节上,老黄还选用了DeepSeek和Kimi K2 Thinking来秀性能。 在Rubin暴力加成下,Kimi K2 Thinking推理吞吐量直接飙了10倍。更夸张的是,token成本暴降到原来的1/10。 这种「指数级」的降本增效,等于宣告了:AI推理即将进入真正的「平价时代」。 另外,在计算需求暴涨这页PPT上,480B的Qwen3和1TB的Kimi K2成为代表性模型,验证了参数规模每年以十倍 ...
英伟达仍是王者,GB200贵一倍却暴省15倍,AMD输得彻底
3 6 Ke· 2026-01-04 11:13
AI推理游戏规则,正悄然改变。一份最新报告揭示了关键转折:如今决定胜负的,不再是单纯的芯片性能或GPU数量,而是 「每一美元能输出多少智 能」。 AI推理,现已不只看算力硬指标了! Signal65一份最新报告中,英伟达GB200 NVL72是AMD MI350X吞吐量28倍。 而且,在高交互场景在,DeepSeek R1每Token成本还能低到15倍。 GB200每小时单价大概是贵一倍左右,但这根本不重要。因为机柜级NVLink互联+软件调度能力,彻底改变了成本结构。 顶级投资人Ben Pouladian称,「目前的关键不再是算力或GPU数量,而是每一美元能买到多少智能输出」。 最关键的是,这还没有集成200亿刀买入Groq的推理能力。 这里,再mark下老黄至理名言——The more you buy, the more you save! AI推理重心:一美元输出多少智能? 这篇万字报告,探索了从稠密模型(Dense)到混合专家模型(MoE)推理背后的一些本质现象。 如今,英伟达仍是王者。其他竞争对手根本做不到这种交互水平,这就是护城河。 传统的「稠密模型」架构要求:在生成每个Token时都激活模型里的 ...
新浪财经隔夜要闻大事汇总:2026年1月3日
Xin Lang Cai Jing· 2026-01-02 23:34
来源:喜娜AI 一、市场: ●1月3日收盘:美股收盘涨跌不一 AI相关股票推动道指与标普收高 北京时间1月3日凌晨,美股周五收盘涨跌不一,道指涨319.10点,纳指微跌,标普500指数小涨。英伟 达、AMD及美光等AI相关个股助力道指与标普收高,美光涨幅约10%创历史新高,但软件股等其他科 技领域下跌,特斯拉因交付量不佳股价跌逾2%。2025年科技股表现最佳,带动三大基准指数均创新 高。不过,当年市场存在较大波动性。华尔街策略师预计2026年美股将进一步上涨,有分析认为市场会 逐步走高且涨势更均衡,今年除科技外还有其他投资主题。 [1] ●1月3日美股成交额前20:美光营收前景看好股价创历史新高 周五美股成交额前 20 中,特斯拉收跌 2.59%,连续七个交易日下跌,2025 年交付量同比降 8.6%,连续 两年下滑。英伟达收高 1.26%,GB200 NVL72 推理性能超 AMD MI355X 约 28 倍。美光收高 10.51%, 股价创历史新高,因人工智能需求带来确定性收入,盈利能力跃升。微软收跌 2.21%,加强 Windows 11 营销。Palantir 收跌 5.56%,"大空头"伯里做空。此 ...
How Extreme Hardware–Software Co-Design Is Driving the Future of AI Supercomputing
NVIDIA· 2025-11-24 16:41
Hi, I'm Jesse Clayton, product marketing manager for AI infrastructure at NVIDIA here at Supercomputing 2025. More than ever, scientific discovery relies on converged high performance computing and AI. But meeting the needs of today's workflows demands extreme code design, hardware and software, built-in synergy to deliver optimizations across the entire stack and accelerate breakthrough science at scale.Nvidia's platform spans GPUs, CPUs, DPUs, nicks, scale up networking, scale out networking, and software ...
上市飙涨 5 倍、随后腰斩,英伟达“亲儿子”CRWV 股价神话何时重现?
RockFlow Universe· 2025-11-24 10:32
划重点 ① 云计算正在经历一场暴力更迭。当 AWS 等传统巨头在忙着 AI 转型时,CoreWeave 等 Neocloud 正凭借极致效率,成为 AI 时代的"新基建"。RockFlow 投研团队认为,Neocloud 正在 打破旧秩序,填补算力黑洞,成为继 NVIDIA 芯片之后,AI 产业链中爆发力最强的 Alpha 赛 道。 ② CoreWeave 是这场大戏的绝对主角。从加密矿工到估值 370 亿美元的独角兽,其营收两年暴 涨 100 倍。这背后是英伟达"权力的游戏"——黄仁勋通过扶持"亲儿子"来制衡科技巨头,赋予 其优先配货权和百亿产能保底。它不仅是英伟达的超级分销商,更手握 AI 时代最深的护城河: 稀缺供给和与 OpenAI、微软的深度绑定。 ③ CoreWeave 的终局会是下一个亚马逊,还是当年的思科泡沫?市场在狂热,但逻辑需冷峻。 投资 Neocloud,本质是博弈算力的"供需周期"。它既有成为 AI 时代"超级公用事业公司" 的潜 力,也面临着高杠杆带来的生死考验。 RockFlow 本文共3192字, 阅读需约11分钟 2025 年 3 月,当 CoreWeave (CRWV) 敲响 ...
Meta首席AI科学家杨立昆拟离职创业;“大空头”伯里:AI巨头靠会计手法人为抬高利润丨全球科技早参
Mei Ri Jing Ji Xin Wen· 2025-11-11 23:57
Group 1: AMD and AI Data Center Market - AMD CEO Lisa Su predicts that the AI data center market will exceed $1 trillion by 2030, highlighting significant growth potential in the industry [1] - AMD plans to launch the next-generation MI400 series AI chips in 2026, which will include various models for scientific computing and generative AI [1] - The company expects overall revenue to grow at a compound annual growth rate of approximately 35% over the next three to five years, primarily driven by its data center business [1] Group 2: Meta's AI Leadership Changes - Meta's Chief AI Scientist Yann LeCun plans to leave the company to start his own venture, indicating a significant shift in the AI landscape [2] - LeCun is reportedly in early discussions with potential investors to fund his startup, which will focus on "world models" research [2] - This departure follows other high-profile exits from Meta's AI division, including the departure of AI Research VP Joelle Pineau and recent layoffs affecting around 600 employees [2] Group 3: OpenAI and Copyright Issues - A German court ruled that OpenAI infringed copyright by using lyrics from a German musician without authorization, requiring compensation to a major music copyright association [3] - This case may set a significant precedent for copyright regulation of generative AI technologies in Europe [3] - The lawsuit was initiated by a major music copyright collective, representing around 100,000 songwriters and publishers [3] Group 4: Microsoft's AI Investment in Europe - Microsoft announced a $10 billion investment in AI infrastructure in Sintra, Portugal, marking one of the largest AI investment projects in Europe [4] - The project will involve collaboration with developers and chip manufacturers, including Nvidia, to deploy 12,600 next-generation GPUs [4] - This investment aims to position Portugal as a leading hub for responsible and scalable AI development in Europe [4] Group 5: Accounting Practices of Tech Giants - Investor Michael Burry criticized major tech companies for extending the useful life of assets to artificially inflate profits, labeling it a common form of fraud [5][6] - Burry highlighted that companies like Meta, Alphabet, Microsoft, Oracle, and Amazon are extending depreciation periods for equipment typically with a 2-3 year lifecycle [5][6] - He estimates that these practices could lead to an inflated profit of $176 billion for large tech companies from 2026 to 2028 due to underestimated depreciation [5][6]
中美算力,都等电来
Xi Niu Cai Jing· 2025-11-07 08:21
Core Insights - The token economy in both China and the U.S. is heavily reliant on electricity, with each country facing unique challenges in this regard [1][3] - The U.S. is experiencing a power shortage due to outdated generation and grid infrastructure, limiting token production [1][2] - In contrast, China faces high token production costs due to relatively low-efficiency hardware, impacting the overall cost of token generation [1][3] Group 1: U.S. Challenges - Microsoft CEO Satya Nadella emphasized that the real issue is not a shortage of GPUs but a lack of electricity, which restricts token production and monetization [1] - Major U.S. tech companies are in a race for AI infrastructure investment, which has turned into a competition for electricity supply [1][2] - The construction of large-scale data centers in the U.S. is progressing from 1GW to 10GW, with companies like Crusoe targeting significant capacity increases [1][2] Group 2: Infrastructure and Policy - Silicon Valley giants are urging the White House for support in developing infrastructure, particularly the power grid, to match the pace of AI innovation [3] - OpenAI has suggested that the U.S. needs to add 100GW of electricity capacity annually to compete effectively in AI against China [3] - The U.S. added 51GW of power capacity last year, while China added 429GW, highlighting a significant "power gap" [3] Group 3: China's Challenges - China's AI infrastructure is built on domestic chips, which currently have lower efficiency, leading to increased demand for computational power [3][4] - ByteDance's daily token calls have surged from 16.4 trillion in May to 30 trillion in September, indicating a rapid increase in computational needs [3] - The cost of electricity for a major cloud provider in China is estimated at 8-9 billion yuan for 1GW annually, reflecting the high operational costs associated with domestic chip usage [5] Group 4: Efficiency and Cost - The competition in the token economy involves not just hardware but also the software, tools, and the electricity and cooling systems required to operate them [4] - Huawei's CloudMatrix 384 has shown a significant increase in total computational power but at a much higher energy cost compared to NVIDIA's latest offerings [5][6] - The average industrial electricity cost in the U.S. is approximately 9.1 cents per kWh, while certain regions in China have reduced costs to below 4 cents per kWh, indicating a competitive advantage for Chinese data centers [6]
回归技术--Scale Up割裂的生态
傅里叶的猫· 2025-10-18 16:01
Core Viewpoint - The article discusses the comparison of Scale Up solutions in AI servers, focusing on the UALink technology promoted by Marvell and the current mainstream Scale Up approaches in the international market [1][3]. Comparison of Scale Up Solutions - Scale Up refers to high-speed communication networks between GPUs within the same server or rack, allowing them to operate collaboratively as a large supercomputer [3]. - The market for Scale Up networks is projected to reach $4 billion in 2024, with a compound annual growth rate (CAGR) of 34%, potentially growing to $17 billion by 2029 [5][7]. Key Players and Technologies - NVIDIA's NVLink technology is currently dominant in the Scale Up market, enabling GPU interconnection and communication within server configurations [11][12]. - AMD is developing UALink, which is based on its Infinity Fabric technology, and aims to transition to a complete UALink solution once native switches are available [12][17]. - Google utilizes inter-chip interconnect (ICI) technology for TPU Scale Up, while Amazon employs NeuronLink for its Trainium chips [13][14]. Challenges in the Ecosystem - The current ecosystem for Scale Up solutions is fragmented, with various proprietary technologies leading to compatibility issues among different manufacturers [10][22]. - Domestic GPU manufacturers face challenges in developing their own interconnect protocols due to system complexity and resource constraints [9]. Future Trends - The article suggests that as the market matures, there will be a shift from proprietary Scale Up networks to open solutions like UAL and SUE, which are expected to gain traction by 2027-2028 [22]. - The choice between copper and optical connections for Scale Up networks is influenced by cost and performance, with copper currently being the preferred option for short distances [20][21].
X @郭明錤 (Ming-Chi Kuo)
Market Position & Competition - Oracle accounted for roughly 12% of global GB200 NVL72 shipments in 2025, trailing Microsoft (~30%), Google (~16%), and Dell (~14%) [1] - Oracle's GB200 NVL72 deliveries arrived later, around late second quarter 2025, compared to initial small-batch shipments starting in first quarter 2025 [1] GPU Rental Business Analysis (June-August 2025) - Oracle generated approximately $900 million from Nvidia chip rentals, with a gross profit of $125 million [1] - Oracle experienced losses on rentals of small quantities of both newer and older Nvidia GPUs [1] - During June-August 2025, Oracle was in the early stages of receiving and deploying GB200 NVL72 systems, resulting in limited Blackwell compute capacity and a focus on prior-generation Hopper rentals [2] Profitability Factors - Early GB200 NVL72 deployments are unlikely to be profitable due to increased AI server costs, front-loaded infrastructure retrofit expenses, and limited initial compute/service scale [3] - Losses largely reflect the early-phase costs of the Hopper-to-Blackwell transition, particularly when considering "small quantities of both newer and older Nvidia chips" [5] Key Takeaways from The Information Report - The "small quantities" mentioned in the report are attributed to the recent arrival and preparation of Blackwell/GB200 NVL72 for service, limiting available compute capacity [4]
X @郭明錤 (Ming-Chi Kuo)
Business Overview - The report analyzes an article by The Information regarding Oracle's Nvidia GPU rental business, focusing on the period of June-August 2025 [1] - The analysis aims to clarify the logic behind the report, without commenting on the motives or Oracle's stock price [1] Financial Performance - In June-August 2025, Oracle generated approximately $900 million from Nvidia chip server rentals, with a gross profit of $125 million [1] - Oracle experienced losses in some cases when renting out small quantities of both newer and older Nvidia chips [1] Industry Dynamics - Oracle's procurement of GB200 NVL72 accounted for approximately 12% of global shipments in 2025, ranking lower than Microsoft (~30%), Google (~16%), and Dell (~14%) [2] - Oracle's GB200 NVL72 acquisition occurred later, around the end of 2Q25, compared to the small-scale shipments that began in 1Q25 [2] - During June-August 2025, Oracle was in the process of acquiring and deploying GB200 NVL72, primarily offering Hopper GPU compute power for rental services [2] Cost Analysis - Initial deployment of GB200 NVL72 was not profitable due to increased AI server costs, infrastructure upgrades, and limited scale of compute power/services [2] - Losses were attributed to the high initial costs of transitioning from Hopper to Blackwell, specifically when considering "small quantities of both newer and older versions of Nvidia chips" [3]