英伟达H100 GPU

Search documents
AI催生算力大变局,无锡给出“芯解法”
2 1 Shi Ji Jing Ji Bao Dao· 2025-09-06 14:08
2023年以来,文心一言、通义千问、智谱、豆包、KIMI等上百家国产大模型争相涌现,参数规模从几 亿乃至上万亿,广泛应用于云计算、数据中心、边缘计算、消费电子、智能制造、智能驾驶、智能金融 及智能教育等领域,用于AI训练和推理的智能算力缺口与日俱增。 以英伟达H100 GPU为例,其单卡算力约为2000 TeraFLOPS,理论上每天需要约130万张卡;若考虑实际 部署中的负载冗余与峰值需求,实际部署量可能达到700万张。 业内普遍认为,AI已成为确定性赛道,这场"马拉松"已经起跑,最终赢家尚未明朗。但可以确定的是, 在算力上的投入是参与竞争的必要条件,而非充分必要条件。 这意味着,尽管算力是AI发展的基础支撑,但仅靠算力并不足以决定胜负,还需要在芯片、算法、数 据、生态、应用落地等多方面形成综合优势。 面对人工智能时代的巨大机遇,无锡抛出解题思路——芯算联动。 在9月4日开幕的2025集成电路(无锡)创新发展大会上,无锡市委书记杜小刚提出要共建芯算联动的应 用场景。"建议各位嘉宾与我们一道,抢抓国家实施'人工智能+行动'的战略机遇,瞄准农业、工业、消 费、惠老、助残和城市治理等各类场景,布局新一代智能终端智 ...
黄靖、郭皓宁:美国对华高科技竞争正转向市场控制
Huan Qiu Wang Zi Xun· 2025-08-12 22:42
来源:环球时报 美国政府日前与美国芯片制造商英伟达公司和超威达成了一项特殊协议,两家企业同意将出口中国的芯 片收入的15%上缴给美国政府,以换取相关产品的出口许可证。这一前所未有的做法引起了国际舆论的 关注。 在此之前,美国总统特朗普于7月23日向外界发布了本届美国政府酝酿的"人工智能(AI)行动计划", 其中包含对联邦机构的指令性要求及部分资助项目。华盛顿寄希望于通过"AI行动计划"放松政策环境, 大力发展基建项目,进一步巩固全球领先地位。今年5月,美国商务部宣布废除前政府制定的人工智能 扩散框架,并发布一系列细则。这一系列变化的背后实则反映出拜登和特朗普前后两任政府打压中国科 技发展的不同思路逻辑和政策转变。 封锁措施未能遏制中国技术进步 司Humain出售1.8万枚英伟达Blackwell高端芯片,并协助其建设AI训练数据中心。美国也计划让阿联酋 每年最多进口50万枚英伟达最先进的H100 GPU,分配给如G42等AI企业。这些操作暴露出华盛顿的战 略意图是希望通过绑定中东国家资金,限制其对中国技术的投资流向,同时输出技术与云平台服务,确 保自身在AI产业链的控制力不被削弱。 未来科技竞争走向将不仅取决 ...
Capex与大美丽法案:算力累积利好中
GOLDEN SUN SECURITIES· 2025-07-27 10:46
Investment Rating - The report maintains a "Buy" rating for the computing power industry, indicating a positive outlook for related companies [6][23]. Core Insights - The computing power industry is experiencing explosive growth driven by unprecedented capital expenditures (Capex) from global tech giants, fueled by the AI wave [19][20]. - The "One Big Beautiful Bill Act" signed by President Trump introduces significant tax cuts and incentives that stimulate growth in the computing power sector [5][20]. - The report emphasizes that the computing power sector is at a critical intersection of surging demand and supportive policies, marking the beginning of a "computing power arms race" [6][23]. Summary by Sections Investment Strategy - The report suggests focusing on companies within the computing power and optical communication sectors, including leaders like Zhongji Xuchuang and New Yisheng, as well as various other related firms [12][23]. Market Review - The communication sector has seen an increase, with the optical communication index performing particularly well [15][18]. Demand Side Analysis - Major tech companies are significantly increasing their Capex to build computing power infrastructure, with Google raising its 2025 Capex target from approximately $75 billion to $85 billion, a record high [21][23]. - Meta plans to invest hundreds of billions to develop superintelligent systems, with substantial increases in its Capex budget [21][23]. Policy Impact - The "One Big Beautiful Bill Act" reduces the federal corporate tax rate from 35% to 21%, permanently easing the tax burden on companies and encouraging reinvestment [5][22]. - The act also restores full expensing for capital investments, enhancing investment returns and accelerating the expansion of the computing power industry [5][22]. Recommendations - The report recommends focusing on key players in the computing power supply chain, including optical communication leaders and companies involved in liquid cooling and edge computing platforms [7][12][23].
这种大芯片,大有可为
半导体行业观察· 2025-07-02 01:50
Core Insights - The article discusses the exponential growth of AI models, reaching trillions of parameters, highlighting the limitations of traditional single-chip GPU architectures in scalability, energy efficiency, and computational throughput [1][7][8] - Wafer-scale computing has emerged as a transformative paradigm, integrating multiple small chips onto a single wafer to provide unprecedented performance and efficiency [1][8] - The Cerebras Wafer Scale Engine (WSE-3) and Tesla's Dojo represent significant advancements in wafer-scale AI accelerators, showcasing their potential to meet the demands of large-scale AI workloads [1][9][10] Wafer-Scale AI Accelerators vs. Single-Chip GPUs - A comprehensive comparison of wafer-scale AI accelerators and single-chip GPUs focuses on their relative performance, energy efficiency, and cost-effectiveness in high-performance AI applications [1][2] - The WSE-3 features 4 trillion transistors and 900,000 cores, while Tesla's Dojo chip has 1.25 trillion transistors and 8,850 cores, demonstrating the capabilities of wafer-scale systems [1][9][10] - Emerging technologies like TSMC's CoWoS packaging are expected to enhance computing density by up to 40 times, further advancing wafer-scale computing [1][12] Key Challenges and Emerging Trends - The article discusses critical challenges such as fault tolerance, software optimization, and economic feasibility in the context of wafer-scale computing [2] - Emerging trends include 3D integration, photonic chips, and advanced semiconductor materials, which are expected to shape the future of AI hardware [2] - The future outlook anticipates significant advancements in the next 5 to 10 years that will influence the development of next-generation AI hardware [2] Evolution of AI Hardware Platforms - The article outlines the chronological evolution of major AI hardware platforms, highlighting key releases from leading companies like Cerebras, NVIDIA, Google, and Tesla [3][5] - Notable milestones include the introduction of Cerebras' WSE-1, WSE-2, and WSE-3, as well as NVIDIA's GeForce and H100 GPUs, showcasing the rapid innovation in high-performance AI accelerators [3][5] Performance Metrics and Comparisons - The performance of AI training hardware is evaluated through key metrics such as FLOPS, memory bandwidth, latency, and power efficiency, which are crucial for handling large-scale AI workloads [23][24] - The WSE-3 achieves peak performance of 125 PFLOPS and supports training models with up to 24 trillion parameters, significantly outperforming traditional GPU systems in specific applications [25][29] - NVIDIA's H100 GPU, while powerful, introduces communication overhead due to its distributed architecture, which can slow down training speeds for large models [27][28] Conclusion - The article emphasizes the complementary nature of wafer-scale systems like WSE-3 and traditional GPU clusters, with each offering unique advantages for different AI applications [29][31] - The ongoing advancements in AI hardware are expected to drive further innovation and collaboration in the pursuit of scalable, energy-efficient, and high-performance computing solutions [13]
五大原因,英伟达:无法替代
半导体芯闻· 2025-06-06 10:20
日益白热化的全球人工智能(AI) 芯片市场,尽管华为(Huawei) 推出Ascend 910C GPU 寄望协助 中国摆脱依赖英伟达(NVIDIA),但遇到明显阻力。 Wccftech 报导,字节跳动、阿里巴巴和腾讯等中国科技大厂,至今仍未大量订购华为AI 芯片。因 英伟达根深蒂固的生态系统(如CUDA 软体) 与华为产品不足。华为910C GPU 缺乏科技企业订 单,转向中国大型国企(SOEs) 和地方政府采购。市场策略转变,突显华为AI 芯片抢占主流市场的 严峻挑战。 来源:内容来自 wcctech 。 华为AI 芯片推广面临五大障碍,是多重因素交织,共同造成华为Ascend 910C GPU 市场推广巨 大阻力。这些障碍不仅限制了华为市场渗透率,也让中国科技大厂对产品望而却步。 首先,英伟达的CUDA 生态系统的根深蒂固。中国许多科技大逞在英伟达的CUDA 生态系统中投 入了大量资金与时间。 CUDA 是英伟达专为其GPU 开发的平行计算平台和程式设计模型,广泛应 用于AI 训练和高性能计算领域,其成熟的工具、函式库和庞大的开发者社群,已形成了一个难以 打破的「护城河」。 对于这些科技公司而言,一旦脱 ...
六年后再次面对禁令,华为云有了更多底气
36氪· 2025-05-16 09:21
Core Viewpoint - The article discusses the competitive landscape of AI computing power, highlighting Huawei's CloudMatrix 384 super node technology as a significant advancement in the face of U.S. export controls on advanced chips, particularly targeting Huawei's Ascend AI chips [2][4][19]. Group 1: U.S. Export Controls and Market Dynamics - On May 13, the U.S. Department of Commerce announced a global ban on Huawei's Ascend AI chips, expanding the ban to all advanced computing ICs from China [2]. - Despite these restrictions, the U.S. tech industry, particularly NVIDIA, is still eager to tap into the Chinese AI market, as evidenced by NVIDIA's announcement of a large order from Saudi Arabia on the same day the ban was issued [2][3]. - The performance degradation of NVIDIA's H20 GPU, which will see a reduction in INT8 precision computing power by over 60%, raises questions about the viability of continued sales to China [3][4]. Group 2: Huawei's Technological Advancements - Huawei's CloudMatrix 384 super node technology can aggregate 384 Ascend computing cards to achieve a computing power of 300 PFlops, rivaling the performance of NVIDIA's H100 GPU [4][13]. - The technology features a new high-speed bus network that enhances inter-card bandwidth by over 10 times, allowing for near-lossless data flow between cards, thus improving training efficiency to nearly 90% of NVIDIA's single-card performance [13][14]. - The CloudMatrix 384 super node is designed to support large-scale expert parallelism, making it compatible with current mainstream models like DeepSeek and GPT [14]. Group 3: Competitive Landscape and Industry Trends - The super node technology represents a critical solution to global AI computing power challenges, with various companies, including NVIDIA and AMD, developing their own versions of super node architectures [15][16]. - Huawei's CloudMatrix 384 is currently the only commercially available large-scale super node cluster globally, having been deployed in Wuhu data center [17]. - The article emphasizes the importance of a comprehensive AI infrastructure that integrates hardware, software, and services, positioning Huawei as a leader in this domain [21][25]. Group 4: Broader Implications and Future Outlook - The ongoing U.S. technology blockade has inadvertently accelerated China's advancements in chip manufacturing and AI technologies, as noted by Bill Gates [19][21]. - The article concludes that modern AI competition is not just about individual chips or models but requires a holistic approach that encompasses a complete ecosystem of hardware and software solutions [21][24].
超越DeepSeek?巨头们不敢说的技术暗战
3 6 Ke· 2025-04-29 00:15
Group 1: DeepSeek-R1 Model and MLA Technology - The launch of the DeepSeek-R1 model represents a significant breakthrough in AI technology in China, showcasing a competitive performance comparable to industry leaders like OpenAI, with a 30% reduction in required computational resources compared to similar products [1][3] - The multi-head attention mechanism (MLA) developed by the team has achieved a 50% reduction in memory usage, but this has also increased development complexity, extending the average development cycle by 25% in manual optimization scenarios [2][3] - DeepSeek's unique distributed training framework and dynamic quantization technology have improved inference efficiency by 40% per unit of computing power, providing a case study for the co-evolution of algorithms and system engineering [1][3] Group 2: Challenges and Innovations in AI Infrastructure - The traditional fixed architecture, especially GPU-based systems, faces challenges in adapting to the rapidly evolving demands of modern AI and high-performance computing, often requiring significant hardware modifications [6][7] - The energy consumption of AI data centers is projected to rise dramatically, with future power demands expected to reach 600kW per cabinet, contrasting sharply with the current capabilities of most enterprise data centers [7][8] - The industry is witnessing a shift towards intelligent software-defined hardware platforms that can seamlessly integrate existing solutions while supporting future technological advancements [6][8] Group 3: Global AI Computing Power Trends - Global AI computing power spending has surged from 9% in 2016 to 18% in 2022, with expectations to exceed 25% by 2025, indicating a shift in computing power from infrastructure support to a core national strategy [9][11] - The scale of intelligent computing power has increased significantly, with a 94.4% year-on-year growth from 232EFlops in 2021 to 451EFlops in 2022, surpassing traditional computing power for the first time [10][11] - The competition for computing power is intensifying, with major players like the US and China investing heavily in infrastructure to secure a competitive edge in AI technology [12][13] Group 4: China's AI Computing Landscape - China's AI computing demand is expected to exceed 280EFLOPS by the end of 2024, with intelligent computing accounting for over 30%, driven by technological iterations and industrial upgrades [19][21] - The shift from centralized computing pools to distributed computing networks is essential to meet the increasing demands for real-time and concurrent processing in various applications [20][21] - The evolution of China's computing industry is not merely about scale but involves strategic breakthroughs in technology sovereignty, industrial security, and economic resilience [21]
对ChatGPT说「谢谢」,可能是你每天做过最奢侈的事
36氪· 2025-04-22 10:28
你对AI说的每一句「谢谢」, 也许正在悄悄被「记录在案」。 来源| APPSO(ID:appsolution) 封面来源 | Unsplash APPSO . AI 第一新媒体,「超级个体」的灵感指南。 #AIGC #智能设备 #独特应用 #Generative AI 以下文章来源于APPSO ,作者发现明日产品的 朋友,你有没有对ChatGPT说过一句「谢谢」? 最近,一位X网友向OpenAI CEO Sam Altman提问:「我很好奇,人们在和模型互动时频繁说『请』和『谢谢』,到底会让OpenAI多花多少钱的电 费?」 尽管没有精确的统计数据,但Altman还是半开玩笑地给出了一个估算——千万美元。他也顺势补了一句,这笔钱到底还是「花得值得」的。 去年底,百度发布了2024年度AI提示词。 数据显示,在文小言APP上,「答案」是最热的提示词,总计出现超过1亿次。而最常被敲进对话框的词汇还包括「为什么」「是什么」「帮我」「怎 么」,以及上千万次「谢谢」。 但你有没有想过, 每和AI说一句谢谢,究竟需要「吃」掉多少资源? 凯特·克劳福德(Kate Crawford)在其著作《AI地图集》中指出,AI并非无形 ...