Workflow
HBF(高带宽闪存)
icon
Search documents
推理芯片的四种方案,David Patterson撰文
半导体行业观察· 2026-01-19 01:54
Core Insights - The article discusses the challenges and research directions for large language model (LLM) inference hardware, emphasizing the need for innovative solutions to address memory and interconnect limitations rather than computational power [1][3]. Group 1: Challenges in LLM Inference - LLM inference is fundamentally different from training due to the autoregressive decoding phase, which presents significant challenges in memory and interconnect rather than computational capacity [3][5]. - The rapid growth in the use of LLMs has led to increased costs associated with maintaining state-of-the-art models, highlighting the economic feasibility of inference [5][6]. - The emergence of mixture of experts (MoE) models, which utilize multiple experts for selective invocation, exacerbates the memory and communication demands during inference [5][6]. Group 2: Current Limitations of LLM Inference Hardware - Existing GPU/TPU systems for inference are often scaled-down versions of training systems, leading to inefficiencies, particularly in the decoding phase [10][11]. - Memory bandwidth improvements have not kept pace with the increase in floating-point operations per second (FLOPS), with NVIDIA's 64-bit GPU performance increasing 80 times from 2012 to 2022, while memory bandwidth only grew 17 times [12][14]. - The cost of high-bandwidth memory (HBM) has risen significantly, with prices increasing by 1.35 times from 2023 to 2025 due to manufacturing complexities [16][18]. Group 3: Research Directions for LLM Inference Hardware - Four promising research directions are proposed to address the challenges of LLM inference: 1. High Bandwidth Flash (HBF) which can provide 10 times the memory capacity [28]. 2. Processing-Near-Memory (PNM) technologies that enhance memory bandwidth [33]. 3. 3D memory logic stacking to achieve high bandwidth with lower power consumption [37]. 4. Low-latency interconnect solutions to improve communication efficiency [38][40]. Group 4: Performance and Cost Metrics - New performance/cost metrics are emphasized, focusing on total cost of ownership (TCO), average power consumption, and carbon emissions, which provide new targets for system design [25][26]. - The need for efficient scaling of memory bandwidth and capacity, as well as optimizing interconnect speed, is highlighted as critical for LLM decoding performance [26][42]. Group 5: Future Implications - The advancements in LLM inference hardware are expected to foster collaboration across the industry, driving essential innovations for cost-effective AI inference [43].
AI泡沫疑云下,投资机会还剩多少?
Guo Ji Jin Rong Bao· 2025-11-25 12:29
Group 1 - The core viewpoint of the articles emphasizes the resurgence of AI concept stocks in the A-share market, driven by a focus on computing power and applications, particularly in storage solutions like HBM and HBF, which are essential for GPU and AI server upgrades [1] - The investment approach for AI differs from that of the internet, with AI focusing on "to Task" models that enhance task efficiency through multi-model collaboration, contrasting with the internet's "to B" and "to C" models that prioritize user engagement and traffic [1] - The long-term investment strategies highlighted include trend investing, which capitalizes on short-term trends, and value investing, which focuses on the long-term growth of companies [1] Group 2 - For ordinary investors, a shift in mindset and investment methods is crucial, moving away from trend guessing and towards a value investment perspective that emphasizes long-term company growth [2] - A unique value investment formula proposed is: Value Investment ≈ Good Asset + Good Price, indicating that both components are essential for successful investment [2]
存储缺货,30年来首次
半导体芯闻· 2025-11-07 10:24
Core Insights - The storage module industry is experiencing a significant shortage driven by AI demand, with NAND Flash prices increasing by approximately 50% this month, and DRAM prices for DDR4 and DDR5 also rising [2] - NAND Flash applications in AI have surged recently, leading to a shift in supplier capacity towards AI applications, with expectations of a substantial market opportunity due to HDD shortages [2] - The major DRAM manufacturers have no plans to increase DDR4 production, anticipating a gradual exit from the DDR4 market by 2026, which limits supply and drives prices up [2] Group 1: Company Performance - In Q3, the company reported revenues of 4.109 billion, a quarter-over-quarter increase of 27.2% and a year-over-year increase of 63.2%, with a net profit of 1.511 billion, reflecting a quarter-over-quarter increase of 259% and a year-over-year increase of 334% [3] - The company is primarily focused on industrial control, with DDR4 still dominating product shipments, while expecting an increase in DDR5 shipments as customers transition [2] Group 2: Market Trends - SanDisk predicts that the NAND Flash supply shortage will persist until at least the end of 2026, with indications that the tight supply situation may extend into 2027 [4] - The demand for NAND Flash is driven by long-term trends, capital investments, and industry transitions, with data centers expected to become the largest segment for NAND Flash by 2026 [4] - SanDisk's wafer fabs are operating at full capacity to replenish significantly reduced inventories, and the company is optimistic about the long-term growth of the data center market [4][6] Group 3: Future Outlook - SanDisk's revenue guidance for Q2 FY2026 is projected between 2.55 billion and 2.65 billion, exceeding market expectations, with adjusted earnings per share forecasted between 3.00 and 3.40 [6] - The company reported a 26% quarter-over-quarter growth in data center revenue, with ongoing collaborations with major data center clients [7] - BiCS8 technology is expected to dominate production by the end of FY2026, currently accounting for 15% of total shipments [7]