计算-存储融合
Search documents
突破“存储墙”,三路并进
3 6 Ke· 2025-12-31 03:35
前言 近年来,AI与高性能计算的爆发式增长,正推动计算需求呈指数级攀升。从ChatGPT的横空出世到Sora带来的视觉震撼,大规模AI模型不仅在参数规模上 指数级膨胀,其对计算能力的需求更是呈现出令人惊叹的增长曲线。 然而,在这片繁荣的背后,一个日益严峻的挑战正浮出水面——"存储墙"。 从千亿参数的大语言模型到边缘端的智能终端,各类应用对存储器的性能、功耗、面积(PPA)提出了前所未有的严苛要求。存储"带宽墙"成为制约AI计 算吞吐量与延迟的核心瓶颈,传统存储器技术已难以满足系统能效优化需求,巨大的性能缺口正制约着AI芯片发挥其全部潜力。 作为全球半导体制造的领导者,台积电深刻洞察到这一根本性矛盾。在2025年的IEDM(国际电子器件会议)教程中,台积电清晰指出:未来AI与高性能 计算芯片的竞争,将不仅仅是晶体管密度与频率的竞赛,更是内存子系统性能、能效与集成创新的综合较量。 AI算力狂奔下,存储"带宽墙"成核心痛点 AI模型的进化史,堪称一场对算力与存储的极限压榨。 从早期的AlexNet到如今的GPT-4、Llama2、PaLM,模型参数从百万级跃升至万亿级,模型规模的扩张直接带动训练与推理阶段的计算量( ...
突破“存储墙”,三路并进
半导体行业观察· 2025-12-31 01:40
Core Viewpoint - The article discusses the exponential growth of AI and high-performance computing, highlighting the emerging challenge of the "storage wall" that limits the performance of AI chips due to inadequate memory bandwidth and efficiency [1][2]. Group 1: AI and Storage Demand - The evolution of AI models has led to a dramatic increase in computational demands, with model parameters rising from millions to trillions, resulting in a training computation increase of over 10^18 times in the past 70 years [2]. - The performance of any computing system is determined by its peak computing power and memory bandwidth, leading to a significant imbalance where hardware peak floating-point performance has increased 60,000 times over the past 20 years, while DRAM bandwidth has only increased 100 times [5][8]. Group 2: Memory Technology Challenges - The rapid growth in computational performance has not been matched by memory bandwidth improvements, creating a "bandwidth wall" that restricts overall system performance [5][8]. - AI inference scenarios are particularly affected, with memory bandwidth becoming a major bottleneck, leading to idle computational resources as they wait for data [8]. Group 3: Future Directions in Memory Technology - TSMC emphasizes that the evolution of memory technology in the AI and HPC era requires a comprehensive optimization across materials, processes, architectures, and packaging [12]. - The future of memory architecture will focus on "storage-compute synergy," transitioning from traditional on-chip caches to integrated memory solutions that enhance performance and efficiency [12][10]. Group 4: SRAM as a Key Technology - SRAM is identified as a critical technology for high-performance embedded memory due to its low latency, high bandwidth, and energy efficiency, widely used in various high-performance chips [13][20]. - TSMC's SRAM technology has evolved through various process nodes, with ongoing innovations aimed at improving density and efficiency [14][22]. Group 5: Computing-in-Memory (CIM) Innovations - CIM architecture represents a revolutionary approach that integrates computing capabilities directly within memory arrays, significantly reducing data movement and energy consumption [23][26]. - TSMC believes that Digital Computing-in-Memory (DCiM) has greater potential than Analog Computing-in-Memory (ACiM) due to its compatibility with advanced processes and flexibility in precision control [28][30]. Group 6: MRAM Developments - MRAM is emerging as a viable alternative to traditional embedded flash memory, offering non-volatility, high reliability, and durability, making it suitable for applications in automotive electronics and edge AI [35][38]. - TSMC's MRAM technology meets stringent automotive requirements, providing robust performance and longevity [41][43]. Group 7: System-Level Integration - TSMC advocates for a system-level approach to memory and compute integration, utilizing advanced packaging technologies like 2.5D/3D integration to enhance bandwidth and reduce latency [50][52]. - The future of AI chips may see a blurring of the lines between memory and compute, with tightly integrated architectures that optimize energy efficiency and performance [58][60].