Workflow
AI推理
icon
Search documents
GPU直连技术引关注,美股存储巨头大爆发
Xuan Gu Bao· 2026-01-06 23:31
隔夜美股存储公司再度集体爆发,闪迪涨超20%,西部数据、希捷科技涨超10%。除了涨价外,机构分 析称,英伟达正在探索GPU与SSD直连新技术。 广发证券认为,AI推理RAG向量数据库将推动SSD需求增长。向量数据库存储介质需承载大规模向量数 据及索引结构,要求支持高吞吐和低时延,以满足高并发场景下的相似度检索需求。目前向量数据库存 储介质正在从"内存参与检索"走向"全SSD存储架构"。 国内方面,根据火山引擎开发者社区公众号,TOS推出Vector Bucket,该架构采用字节自研的Cloud- Native向量索引库Kiwi与多层级本地缓存协同架构(涵盖DRAM、SSD与远程对象存储)。在大规模、 长周期存储和低频查询的场景下,该架构不仅满足高/低频数据的分层需求,而且显著降低企业大规模 使用向量数据的门槛。 整体来看,RAG架构为大模型提供长期记忆,企业和个性化需求推动了对RAG存储需求的增长。AI推 理中的RAG向量数据库存储介质正在从"内存参与检索"向"全SSD存储架构"过渡,推动高带宽、大容量 SSD的需求将持续增加。 以下为Invalid Date原文: *免责声明:文章内容仅供参考,不构成投资建 ...
AI竞赛转向推理,如何影响国际科技竞争格局?
周城雄(中国科学院科技战略咨询研究院研究员、数智创新与治理研究中心副主任) 2026年1月5日,美国拉斯维加斯CES展会上,英伟达CEO黄仁勋出人意料地提前发布了下一代AI芯片平 台"Rubin",打破其一贯在3月GTC大会集中发布新品的传统。这一举动释放出一个关键信号:全球AI竞赛正 从"训练主导"全面转向"推理驱动",这不仅是技术路线的演进,更是整个AI产业生态、基础设施布局乃至国家间 科技竞争格局的重大转折点。 过去数年,大模型训练是AI发展的核心焦点。以GPT、Llama、Claude等为代表的大语言模型(LLM)不断刷新参 数规模,对算力的需求呈指数级增长,催生了以英伟达H100、Blackwell为代表的高性能GPU集群建设热潮。然 而,训练只是AI生命周期的一环。真正决定AI能否落地、能否创造经济价值的关键,在于推理——即模型在实 际应用场景中对用户输入进行实时响应的能力。 推理场景具有高频、低延迟、高并发、成本敏感等特点。例如,一个智能客服系统每天可能处理数百万次用户 查询,每一次都需要在毫秒级内完成推理;自动驾驶车辆则需在复杂环境中持续进行多模态推理以保障安全。 这些需求对硬件效率、能耗比、 ...
纳指高开0.22%,英伟达涨1.3%,禾赛涨近8%
Ge Long Hui· 2026-01-06 14:37
入选2026年格隆汇"全球视野"十大核心资产的英伟达涨1.3%,Vera Rubin平台全面投产,AI推理性能提 升5倍、成本降至1/10。 诺和诺德涨4.3%,日前在美国正式推出全球首款用于成人减重的GLP-1口服药。 禾赛涨近8%,2026规划年产能翻番至400万台,获英伟达选定为激光雷达合作伙伴。 蔚来涨2.3%,蔚来第100万台车下线,李斌称未来每年销量要增长40%至50%。 (格隆汇) 美股开盘,三大指数涨跌不一,纳指涨0.22%,标普500指数涨0.1%,道指跌0.03%。 ...
黄仁勋罕见提前宣布:新一代GPU全面投产
炒股就看金麒麟分析师研报,权威,专业,及时,全面,助您挖掘潜力主题机会! 早在2025年3月的GTC大会上,黄仁勋就已预告了代号"Vera Rubin"的超级芯片,并明确其将于2026年量 产。 当地时间1月5日,在美国CES上,黄仁勋出乎意料地提前发布了下一代AI芯片平台"Rubin",打破了英 伟达通常在每年3月GTC大会上集中公布新一代架构的传统。 AI竞赛进入推理时代,英伟达决定加速出击。 Vera Rubin已投产 此次在CES上,黄仁勋对Rubin平台进行了系统性发布,Rubin成为英伟达最新GPU的代号。 "Rubin的到来正逢其时。无论是训练还是推理,AI对计算的需求都在急剧攀升。"黄仁勋表示,"我们坚 持每年推出新一代AI超级计算机,通过六颗全新芯片的极致协同设计,Rubin正在向AI的下一个前沿迈 出巨大一步。" Rubin平台采用极端协同设计理念,整合了6颗芯片,包括NVIDIA Vera CPU、Rubin GPU、NVLink 6交 换芯片、ConnectX-9 SuperNIC、BlueField-4 DPU以及Spectrum-6以太网交换芯片,覆盖了从计算、网络 到存储与安全的 ...
AI竞赛转向推理,英伟达宣布Rubin芯片平台全面投产
Core Insights - NVIDIA has accelerated its AI chip platform release schedule by unveiling the next-generation AI chip platform "Rubin" earlier than usual at CES on January 5, 2026, breaking its traditional March GTC announcement pattern [1][2] Group 1: Rubin Platform Overview - The Rubin platform, which includes six new chips, is designed for extreme collaboration and aims to meet the increasing computational demands of AI for both training and inference [4] - Compared to the previous Blackwell architecture, Rubin accelerators improve AI training performance by 3.5 times and operational performance by 5 times, featuring a new CPU with 88 cores [4] - Rubin can reduce inference token costs by up to 90% and decrease the number of GPUs required for training mixture of experts (MoE) models by 75% compared to the Blackwell platform [4] Group 2: Ecosystem and Market Response - The NVL72 system, which includes 72 GPU packaging units, was also announced, with each unit containing two Rubin dies, totaling 144 Rubin dies in the system [5] - Major cloud providers and model companies, including AWS, Microsoft, Google, OpenAI, and Meta, have responded positively to Rubin, indicating strong market interest [5] - NVIDIA aims to provide engineering samples to ecosystem partners early to prepare for subsequent deployment and scaling applications [5] Group 3: AI Strategy and Product Launches - NVIDIA's focus is shifting from "training scale" to "inference systems," as demonstrated by the introduction of the Inference Context Memory Storage Platform, designed specifically for inference scenarios [6] - The company is also advancing its long-term strategy in physical AI, releasing open-source models and frameworks that extend AI capabilities to robotics, autonomous driving, and industrial edge scenarios [6] - The launch of the Cosmos and GR00T series models aims to enhance robotic learning, reasoning, and action planning, marking a significant step in the evolution of physical AI [7] Group 4: Autonomous Driving Developments - NVIDIA introduced the Alpamayo open-source model family for autonomous driving, targeting "long-tail scenarios," along with a high-fidelity simulation framework and an open dataset for training [9] - The first autonomous vehicle from NVIDIA is set to launch in the U.S. in the first quarter, with plans for expansion to other regions [9] - The overall strategy emphasizes that the competition in AI infrastructure is moving towards "system engineering capabilities," where the complete delivery from architecture to ecosystem is crucial [9]
英伟达200亿美元“押注”背后的深意
美股研究社· 2026-01-05 12:54
以下文章来源于芯东西 ,作者ZeR0 芯东西 . 芯东西专注报道芯片、半导体产业创新,尤其是以芯片设计创新引领的计算新革命和国产替代浪潮;我们是一群追"芯"人,带你一起遨游"芯"辰大海。 来源 | 芯东西 200亿美元 。 这是英伟达买下AI芯片独角兽Groq团队和非独家技术授权后,最先被市场记住的数字。它超过了此前英伟达任何一笔并购交易的金额。 Groq主攻的是其特有的LPU芯片技术,一种用 软件定义硬件的可重构数据流架构 ,加之Groq由谷歌TPU初始研发团队创办,于是也被一些业 内人士称作"进阶版TPU"。 经过多天发酵,此事的核心关注点已经转移。英伟达的选择,使「非GPU」赛道新型技术路径受到高度关注。类似技术路线的代表企业,还包 括Intel正在收购的美国的SambaNova、刚刚完成数十亿融资的中国的清微智能等。 在公司主体未被收购的前提下,200亿美元这个天价数字值得被反复咀嚼: 英伟达究竟在为一种怎样的技术能力付费? 花 掉 近 1 / 3 现 金 储 备 , 英 伟 达 在 下 一 盘 多 大 的 棋 ? 答案是 AI推理 。 在对外表态保持克制的同时,英伟达CEO黄仁勋发送了一封致员工邮件 ...
瀚博半导体:争做全球AI推理芯片的领导者
Xin Lang Cai Jing· 2026-01-04 12:25
来源:上海证券报·中国证券网 上证报中国证券网讯(记者 李兴彩)2025年圣诞节前夕,一则来自海外的并购消息在半导体行业激起涟漪。据公开信息显示,英伟达拟 以约200亿美元,收购高性能AI加速芯片初创公司Groq的部分核心技术资产,并引入其关键工程团队,以强化在AI推理领域的布局,并 帮助英伟达覆盖更广泛的AI推理和实时工作负载。英伟达这一举动,被业内普遍视为一个清晰信号:AI算力的重心,正在从"训练为 王"转向"推理为先"。当大模型逐步走向规模化应用,实时、低成本、可部署的推理能力,正成为新的竞争焦点。 瀚博半导体创始人兼CEO钱军 在行业同样快速变化和发展的中国,也有这样一个团队,在更早的2018年便有了类似的洞见和研判,并以此为契机,创立了瀚博半导 体。 钱军强调,AGI(通用人工智能)时代,AI大模型应用对"云端AI推理+云端渲染"产生海量的需求,这个市场不仅很大,还是蓝海。 "当时我们就确认:AI训练芯片市场固然很大,但具有更大爆发力的是云端AI推理。"近日,瀚博半导体创始人兼CEO钱军接受上海证券 报记者采访,详细解析了其投资AI推理市场的逻辑及远大前景。 钱军和张磊(公司联合创始人兼CTO)的超 ...
英伟达仍是王者,GB200贵一倍却暴省15倍,AMD输得彻底
3 6 Ke· 2026-01-04 11:13
Core Insights - The report highlights a significant shift in AI inference economics, where the focus has moved from raw chip performance to the intelligence output per dollar spent [1][4][46] - NVIDIA continues to dominate the market, with its GB200 NVL72 outperforming AMD's MI350X by a factor of 28 in throughput [1][5][18] AI Inference Economics - The key metric for evaluating AI infrastructure has transitioned to "how much intelligence can be obtained for each dollar" [4][6][46] - In high-interaction scenarios, the cost per token for DeepSeek R1 can be reduced to 1/15th of other solutions [2][20] Model Architecture - The report discusses the evolution from dense models to mixture of experts (MoE) models, which activate only the most relevant parameters for each token, improving efficiency [9][11][46] - MoE models are becoming the standard for top open-source large language models (LLMs), with 12 out of the top 16 models utilizing this architecture [11][14] Performance Comparison - In terms of performance, the GB200 NVL72 shows a significant advantage over AMD's MI355X, achieving up to 28 times the performance in certain scenarios [18][24][30] - The report indicates that as interaction rates increase, the performance gap between NVIDIA and AMD platforms widens, with NVIDIA's solutions becoming increasingly efficient [30][37] Cost Efficiency - Despite the higher hourly cost of the GB200 NVL72, its advanced architecture and software capabilities lead to a lower cost per token, making it more economical in the long run [20][41][45] - The analysis shows that the GB200 NVL72 can achieve a performance per dollar advantage of approximately 12 times compared to its competitors [42][44] Future Trends - The future of AI models is expected to lean towards larger and more complex MoE architectures, with platform-level design becoming a critical factor for success [46][47] - Companies like OpenAI, Meta, and Anthropic are likely to continue evolving their flagship models in the direction of MoE and inference, maintaining NVIDIA's competitive edge [46]
大手笔背后的焦虑,英伟达用200亿美元购买Groq技术授权
Sou Hu Cai Jing· 2026-01-01 10:19
文 |无言 2025年圣诞前夜,英伟达扔出个重磅消息:花200亿获取AI芯片初创公司Groq的技术授权,还把对方首 席执行官在内的核心高管全挖了过来。 这可是英伟达史上最大一笔交易,金额差不多抵得上过去所有并购案的总和。 本来想觉得这钱花得冤,毕竟Groq成立才9年,算个行业小字辈,但后来发现里面全是门道。 200亿交易不简单:非收购是巧招 这笔交易的模式挺耐人寻味。它不是完全收购,而是非排他性技术授权加人才挖角。 有媒体说这是资产收购,但更多报道都谨慎强调了"技术授权"这个核心。 为啥要这么操作?很显然,是为了避开反垄断审查。 英伟达现在市值快摸到3.5万亿美元,体量摆在这,监管机构盯着它的每一个大动作。 要是直接全收购,大概率会触发审查红线,反而耽误事。 200亿买的不只是技术,还有整个团队的经验和专利。 尤其是Groq的创始人,他可是谷歌TPU的创始人之一。 这人对AI芯片架构的理解,怕是硅谷没几个工程师能比。 把他挖过来,相当于从谷歌阵营撬走了关键人物。 这种操作既拿到了核心技术,又网罗了顶尖人才,还规避了风险,不得不说想得挺周全。 LPU凭啥值天价?技术卡准关键点 Groq的核心产品是LPU,也就是 ...
电子行业周报:领益智造收购立敏达,持续关注端侧AI-20251231
East Money Securities· 2025-12-31 08:24
Investment Rating - The report maintains a rating of "Outperform" for the industry, indicating an expected performance that exceeds the market average [2]. Core Insights - The report emphasizes the dominance of AI inference in driving innovation, particularly in areas related to operational expenditure (Opex) such as storage, power, ASIC, and supernodes [31]. - The acquisition of 35% of Limin Da by Lingyi Zhi Zao for 875 million RMB is highlighted, positioning the company to leverage advanced thermal management technologies in the AI sector [25]. - The report identifies significant growth opportunities in the domestic storage industry, particularly with the anticipated expansion of NAND and DRAM production in the coming year [32]. Summary by Sections Market Review - The Shanghai Composite Index rose by 1.88%, while the Shenzhen Component Index increased by 3.53%, and the ChiNext Index saw a rise of 3.9%. The Shenwan Electronics Index increased by 4.96%, ranking 4th among 31 sectors, with a year-to-date increase of 48.12% [12][18]. Weekly Focus - Lingyi Zhi Zao's acquisition of Limin Da is noted for its strategic alignment with AI computing and thermal management solutions [25]. - NVIDIA's non-exclusive licensing agreement with Groq is discussed, highlighting its potential to enhance NVIDIA's position in high-performance computing and AI chips [26]. Weekly Insights - The report forecasts a significant increase in demand for storage solutions driven by advancements in products from Yangtze Memory Technologies and Changxin Memory Technologies, suggesting a focus on the domestic storage supply chain [31]. - The report also highlights the importance of power supply innovations, recommending attention to both generation and consumption technologies [33]. - ASIC technology is expected to gain market share, with a focus on key domestic and international cloud service providers [33]. - The report anticipates growth in supernode technologies, including high-speed interconnects and liquid cooling solutions [33].