大模型推理
Search documents
暴涨近28%!黄仁勋一句话引爆存储股,机构称存储超级周期持续至2027年
Jin Rong Jie· 2026-01-07 00:49
存储芯片市场正被一股前所未有的热潮席卷。隔夜美股市场上,闪迪(Sandisk)股价暴涨近28%,希捷、西部数据等硬盘制造商涨幅均超14%,美光科技亦 大涨10%,整个板块呈现井喷态势。 野村证券表示,这一轮始于今年下半年的存储超级周期至少延续至2027年,并且真正有意义的新增供给最早要到2028年初期才会出现。野村分析师团队表 示,投资者们在2026年应继续超配存储龙头,把存储芯片价格—利润—估值三击作为2026年存储投资主线,而不是仅把存储当HBM单一题材,该机构预计 三大存储芯片公司(三星电子、SK海力士、美光科技)盈利将创历史新高。 股票频道更多独家策划、专家专栏,免费查阅>> 责任编辑:栎树 此番言论并非空穴来风,其背后是人工智能,尤其是大模型推理对数据存储需求的爆炸性增长。黄仁勋在CES上详细介绍了一项关键技术革新——将高速存 储(KV缓存)直接集成到GPU机架内的"context memory架构"。随着模型上下文长度从十万级迈向亿级,所需的存储空间呈几何级数增长。旧有的存储架构 已不堪重负,而将存储贴近GPU的革新,旨在解决因海量数据频繁移动导致的网络拥堵痛点。黄仁勋称此想法"完全革命性",它首 ...
田渊栋的2025年终总结:关于被裁和26年的研究方向
自动驾驶之心· 2026-01-06 00:28
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 最近太忙,只能把年终总结放到1月1日之后再写了,不管怎样,能开始动笔就是好事。 作者 | 田渊栋@知乎 编辑 | 大模型之心Tech 原文链接: https://zhuanlan.zhihu.com/p/1990809161458540818 关于被裁 在2025年1月底被要求加入Llama4救火的时候,作为一直以来做强化学习的人,我事先画了一个2x2的回报矩阵(reward matrix),计算了一下以下四种可能(虽然在 那时,因为来自上面的巨大压力,不同意是几乎不可能的): | | 同意帮忙 | 拒绝帮忙 | | --- | --- | --- | | Llama4项目成功 | 成为英雄 | 被边缘化 | | Llama4项目未成功 | 为公司尽力 | 被人骂在公司需要时不出力 | 当时想的是我们去帮忙的话,即便最后项目未能成功,也至少尽力而为,问心无愧。不过遗憾的是,最后发生的是没在计算之内的第五种可能,这也让我对 ...
田渊栋2025年终总结:救火Llama4但被裁,现任神秘初创公司联创
机器之心· 2026-01-04 08:05
机器之心报道 去年 10 月,Meta 人工智能部门的裁员波及到了一大波人,其中包括了知名华人科学家田渊栋及其团队成员。 就在这两天,田渊栋分享了自己的 2025 年终总结。 他首先透露了自己「救火」Llama 4 项目的经历以及之后被裁、未来的工作规划;接着回顾了 2025 年的主要研究方向,包括大模型推理和打开模型的黑箱;最后 探讨了 AI 驱动下的社会变革、生产力重构以及个人价值的存续逻辑。 接下来为田渊栋知乎原文内容。 2025年终总结(一) 关于被裁 在 2025 年 1 月底被要求加入 Llama4 救火的时候,作为一直以来做强化学习的人,我事先画了一个 2x2 的回报矩阵(reward matrix),计算了一下以下四种可能 (虽然在那时,因为来自上面的巨大压力,不同意是几乎不可能的): | | 同意帮忙 | 拒绝帮忙 | | --- | --- | --- | | Llama4 项目成功 | 成为英雄 | 被边缘化 | | Llama4 项目未成功 | 为公司尽力 | 被人骂在公司需要时不出力 | 当时想的是我们去帮忙的话,即便最后项目未能成功,也至少尽力而为,问心无愧。不过遗憾的是,最后发生 ...
LeCun曝Meta作弊刷榜,田渊栋:我没想到这个结局
量子位· 2026-01-04 05:21
Core Viewpoint - The article discusses the fallout from the release of Meta's Llama 4, highlighting internal conflicts and the departure of key figures like LeCun and Tian Yuandong, who are now pursuing entrepreneurial ventures due to dissatisfaction with Meta's direction in AI development [1][3][22]. Group 1: Llama 4 and Internal Conflicts - Llama 4 faced significant criticism and allegations of cheating in benchmark tests, leading to a loss of confidence from Meta's leadership [1][10]. - The release of DeepSeek, a competing AI model, pressured Meta to accelerate its AI investments, resulting in internal turmoil and a shift in team dynamics [4][6]. - The communication breakdown within the team was exacerbated by differing priorities, with LeCun's team wanting to innovate while leadership preferred proven technologies [7][8]. Group 2: Departures and New Ventures - LeCun and Tian Yuandong both announced their intentions to start new companies after leaving Meta, with LeCun focusing on world models and Tian Yuandong on new AI initiatives [27][33]. - LeCun's new venture, Advanced Machine Intelligence (AMI), aims to explore advanced machine intelligence through open-source projects, while he will serve as the executive chairman [27][30]. - Tian Yuandong expressed a desire to co-found a startup, indicating a trend among former Meta employees to seek new opportunities outside the company [33]. Group 3: Future Directions in AI - LeCun's focus on the V-JEPA architecture aims to enhance AI's understanding of the physical world through video and spatial data, with expectations for significant progress within 12 months [32]. - The article emphasizes the need for AI to move beyond language limitations, as highlighted by LeCun's critique of the current focus on large language models [25][26].
首都在线20251230
2025-12-31 16:02
首都在线 20251230 摘要 首都在线受益于大模型推理应用增长,边缘云需求随之提升,公司致力 于打造国产与英伟达适配平台,支持 MiniMax 等大模型企业出海,客户 依赖性增强。 基于 MaaS 服务和 Converged UI 平台的新业务每月增长率达 20%- 30%,有望显著提升公司毛利和收入水平。 首都在线转型较早,与智谱等公司建立了深度合作,积累了技术投入和 用户理解优势,并具备全球资源适配能力,满足客户国内外资源布局需 求。 公司通过"铁三角战略"(销售、产品技术解决方案、大客户服务)深 耕大客户,提供国产设备适配服务,确保业务合规性和连续性。 首都在线积极布局前瞻领域,包括太空算力相关业务,已在文昌建设算 力中心,在庆阳设立算力中心支持酒泉卫星发射,并探索 AI 生成短剧等 行业应用。 公司采取根据客户需求逐步扩大资产和合作的策略,国内重点布局庆阳、 怀来、芜湖和海南等八大节点,海外布局包括达拉斯和新加坡。 首都在线通过 MAAS 服务及 ComfyUI 与模型厂商合作推广业务,获取 较高毛利,并通过政府补助及政策支持,实现区域内较高盈利。 Q&A 近期大模型上市和 AI 应用发展对首都在 ...
2025年大模型推理优化与部署实践产业洞察研究报告-云计算开源产业联盟
Sou Hu Cai Jing· 2025-12-25 02:34
报告核心指出大模型产业已从 "模型创新" 迈入 "规模落地" 关键期,推理优化与高效部署成为核心竞争力,市场呈现高速增长态势,多元部署形态与全栈优 化技术协同推动行业发展,同时面临成本、标准等多重挑战。 市场规模快速扩张,推理驱动特征凸显。全球 AI 推理算力市场 2021-2024 年增长近十倍,2024 年达 139.58 亿美元,2025 年预计增至 183.55 亿美元;中国 市场增速更迅猛,2025 年规模达 438.3 亿元,年均复合增长率 66.3%,2026 年 AI 服务器推理工作负载占比将升至 70.5%。市场竞争格局多元,天翼云、阿 里云、华为云位居国内市场前列,国外则由亚马逊、谷歌、微软主导,基于 Token 的计费模式成为主流,模型即服务(MaaS)商业模式快速普及。 部署形态日趋多元,适配不同场景需求。当前形成四大主流部署方式:MaaS 凭借弹性计费与低门槛优势,成为中小企业首选;大模型推理一体机以软硬一 体化、开箱即用特性,受央国企及政务单位青睐,2025 年预计出货量超 10 万台;私有化部署平台满足金融、医疗等行业数据安全与定制化需求,81% 企业 选择云原生形式部署;云 - ...
国产算力迈入“万卡”时代:摩尔线程发布新一代GPU架构,中科曙光发布万卡超集群
Jing Ji Guan Cha Wang· 2025-12-20 06:47
Core Insights - The article discusses the advancements in the domestic GPU industry, highlighting the launch of the "Huagang" architecture by Moore Threads and the "scaleX" supercluster system by Inspur, indicating a shift in focus from individual GPU performance to building scalable systems capable of handling massive computational tasks [2][6]. Group 1: Moore Threads Developments - Moore Threads unveiled its latest "Huagang" architecture, which boasts a 50% increase in computing density and a 10-fold improvement in efficiency compared to the previous generation [3]. - The "Huagang" architecture supports full precision calculations from FP4 to FP64 and introduces new support for MTFP6, MTFP4, and mixed low precision [3]. - Future chip plans include "Huashan," aimed at AI training and inference, and "Lushan," focused on high-performance graphics rendering, with "Lushan" showing a 64-fold increase in AI computing performance and a 50% improvement in ray tracing performance [4]. Group 2: Inspur Developments - Inspur's "scaleX" supercluster system, which publicly debuted, consists of 16 scaleX640 supernodes interconnected via the scaleFabric high-speed network, capable of deploying 10,240 AI accelerator cards [10]. - The scaleX system employs immersion phase change liquid cooling technology to address heat dissipation challenges, achieving a 20-fold increase in computing density per rack and a PUE (Power Usage Effectiveness) of 1.04 [11][12]. - The system supports multi-brand accelerator cards and has optimized compatibility with over 400 mainstream large models, reflecting a strategy to provide a versatile platform for various domestic computing resources [14]. Group 3: Industry Challenges and Solutions - The industry faces challenges in scaling up computational power, particularly in managing heat, power supply, and physical space limitations when deploying thousands of high-power chips in data centers [8][9]. - Both companies are addressing communication delays in distributed computing, with Moore Threads integrating a new asynchronous programming model and self-developed MTLink technology to support clusters exceeding 100,000 cards, while Inspur's scaleFabric network achieves 400 Gb/s bandwidth and sub-microsecond communication latency [12][13]. Group 4: Software Ecosystem and Compatibility - As the hardware specifications approach international standards, the focus is shifting towards optimizing the software stack, with Moore Threads announcing an upgrade to its MUSA unified architecture and achieving over 98% efficiency in core computing libraries [13]. - Inspur emphasizes the compatibility of its systems with various brands of accelerator cards, promoting an open architecture strategy that allows for coexistence of multiple chips [14].
平价数码产品,要和我们说再见了?
虎嗅APP· 2025-12-15 10:26
出品 | 虎嗅科技组 作者 | 丸都山 编辑 | 苗正卿 头图 | 视觉中国 内存涨价的持续时间及烈度,可能被所有人低估了。 但SK海力士的这份报告无疑表明,即便是等到新增DRAM产能释放,也完全不足以抵消掉市场需求。 那么内存市场如此严重的供需错配,根源究竟在哪里?对于普通消费者来说,又会带来什么样的持续性影 响? 谁在抢夺DRAM? 01 我们用一类最有代表性的产品——电脑内存条,去看下DRAM在今年的涨价幅度。 在今年年初,DDR5 16GB(5600MHz)内存条在电商平台的报价在300元左右,而当前同规格内存条的电商 平台价格最低为899元,相当于在不到一年的时间里,价格上涨了200%。 据科技媒体Wccftech报道,一份源于SK海力士内部会议的文件表示, 全球DRAM供不应求的情况预计持续 到2028年底。 这个判断着实是远超出了行业此前的预期,比如在上个月的小米三季度财报电话会议上,卢伟冰就曾表示, 现在各家(手机厂商)普遍承受着源自存储芯片成本的压力,而且明年的产品零售价还会有较大幅度上涨, 直到2027年,这种供需失衡的问题才有望扭转。 卢伟冰提到的"2027年",实际上也是目前行业普遍认 ...
基于 SGlang RBG + Mooncake 打造生产级云原生大模型推理平台
AI前线· 2025-12-12 00:40
Core Insights - The article emphasizes the rapid evolution of large language model (LLM) inference services into core enterprise infrastructure, focusing on the balance of performance, stability, and cost in building high-performance inference systems [2] - It discusses the transition from monolithic to distributed architectures in LLM inference, highlighting the need for external KVCache to alleviate memory pressure and enhance performance in high-demand scenarios [2][4] Distributed KVCache and Mooncake - Mooncake is introduced as a leading distributed KVCache storage engine designed to provide high throughput and low latency for inference frameworks like SGLang [3] - The article outlines the challenges in managing distributed KVCache systems in production environments, which necessitate the development of RoleBasedGroup (RBG) for unified management of caching and inference nodes [4] RoleBasedGroup (RBG) Design and Challenges - RBG is presented as a Kubernetes-native API aimed at AI inference, facilitating multi-role orchestration to ensure stable and high-performance operations [4][12] - The article identifies five fundamental challenges in deploying large model inference services, including the need for strong state management and performance optimization [12][15] SCOPE Framework - The SCOPE framework is introduced, focusing on five core capabilities: Stability, Coordination, Orchestration, Performance, and Extensibility, which are essential for managing LLM inference services [16][18] - RBG's design allows for rapid architecture iteration and performance-sensitive operations, addressing the complexities of multi-role dependencies and operational efficiency [15][24] Benchmark Testing and Performance Metrics - Benchmark tests demonstrate significant improvements in KVCache hit rates and inference performance, with L3 Mooncake cache achieving a 64.67% hit rate and reducing average TTFT to 2.58 seconds [32][48] - The article highlights the importance of a multi-tier caching architecture in enhancing performance for applications like multi-turn dialogue and AI agents [44] Conclusion and Future Outlook - The integration of RBG and Mooncake is positioned as a transformative approach to building production-grade LLM inference services, emphasizing the need for deep integration of high-performance design with cloud-native operational capabilities [43][44] - The article concludes with a call for community collaboration to advance this paradigm and lay the foundation for the next generation of AI infrastructure [43]
当算力追赶不上智能:2026年AI行业的缺口与爆发(附86页PPT)
材料汇· 2025-12-10 15:51
Core Insights - The rapid evolution of AI is outpacing the development of computing infrastructure, leading to a significant gap in computing power that is expected to widen by 2026. This gap will manifest in two key areas: a growing demand for core computing capabilities across chips, storage, packaging, and cooling, and a shift towards edge computing to reduce cloud latency and costs, resulting in an explosion of applications from AI smartphones to integrated robots [1]. Industry Overview - The electronic sector has reached a record high in Q3 2025, driven by AI, with the electronic index rising by 44.5% year-to-date, outperforming the CSI 300 index by 26.6% [12][13]. - The semiconductor sector has shown significant growth, with various sub-sectors experiencing substantial increases: PCB (+114%), consumer electronics (+51%), and semiconductors (+40%) year-to-date [12][13]. - The overall electronic industry reported a revenue increase of 19% and a net profit increase of 35% in Q1-Q3 2025, with all major segments showing positive growth [18][24]. Performance Metrics - The electronic sector's inventory levels have risen, particularly in consumer electronics and PCBs, indicating strong demand and recovery in terminal markets [22][25]. - The semiconductor sector's monthly sales growth has rebounded since June 2023, with a notable increase in demand for digital, storage, and equipment segments [34][41]. AI Impact on Semiconductor Cycle - The semiconductor market is entering an upward cycle, with significant growth in capital expenditures from both domestic and international cloud service providers, driven by AI demand [41][42]. - Major cloud providers are expected to increase their capital expenditures significantly, with projections indicating a 50%-60% growth in 2026 [43]. Consumer Electronics Trends - Global smartphone sales are projected to recover, with a forecast of 1.29 billion units in 2024, reflecting a 6.1% year-on-year increase [26][27]. - The PC market is also expected to grow, with global sales reaching 263 million units in 2024, a 1.0% increase year-on-year [27][29]. Automotive Sector Insights - The automotive market is experiencing a weak recovery, with global sales expected to reach 92.23 million units in 2025, reflecting a 1.8% year-on-year increase [39]. - The penetration rate of electric vehicles is projected to rise, with expectations of 20% in 2025 for global sales [39]. AI Narrative Acceleration - The competition among AI model developers has intensified, with significant advancements in model capabilities and applications across various sectors [47][50]. - The demand for AI-related spending is expected to reach $3-4 trillion by 2030, driven by the need for enhanced computing power and applications [58]. Edge Computing and Hardware Development - The shift towards edge computing is becoming crucial, with predictions indicating that the global edge AI market will grow to ¥1.2 trillion by 2029, with a CAGR of 39.6% [69]. - Major AI companies are actively entering the edge hardware market to enhance user experience and profitability [69].