Workflow
3D DRAM存算一体架构,清华团队发布
半导体行业观察·2024-08-10 02:21

Core Viewpoints - The article discusses a groundbreaking 3D DRAM-computing integrated architecture developed by Tsinghua University's Integrated Circuit Institute, which significantly overcomes the "memory wall" bottleneck and enhances AI model computational efficiency [1][5][6] - The architecture leverages hybrid bonding technology to stack computing units and DRAM cells vertically, enabling high-density data pathways and independent manufacturing of DRAM arrays and logic circuits [5] - The design introduces a similarity-aware computing mechanism that exploits data similarity within DRAM banks to accelerate AI computations, achieving substantial improvements in throughput, energy efficiency, and area efficiency [6][13][14][15] Research Background - The rapid growth of AI models has introduced significant memory overhead, leading to the "memory wall" bottleneck, where frequent data transfers between off-chip memory and on-chip computing units are constrained by memory bandwidth [2] - Traditional near-memory computing solutions, such as HBM (High Bandwidth Memory), improve bandwidth but are limited by the "beachfront problem," which restricts the number of I/O channels due to signal interference [3] 2D In-Memory Computing and Process Bottlenecks - 2D in-memory computing integrates computing units within DRAM arrays, addressing the beachfront problem but introducing process bottlenecks, as computing circuits must use DRAM technology, which is less efficient than advanced logic processes [4] - For example, Samsung's HBM-PIM reduces DRAM storage capacity by 50% when integrating in-memory computing units [4] 3D Computing-Memory Fusion Architecture - The 3D computing-memory fusion architecture stacks computing units and DRAM cells vertically, using copper pillars for data interconnection, effectively solving the beachfront problem and allowing arbitrary placement of data I/O channels [5] - This architecture enables independent manufacturing of DRAM arrays and logic circuits, avoiding the process limitations of DRAM technology and preserving storage capacity [5] Similarity-Aware 3D Computing-Memory Architecture - The architecture exploits the "cluster similarity effect" within DRAM banks, where local data regions exhibit similarity, enabling parallel and independent similarity-aware computations within each computing bank [6] - Key challenges addressed include efficient similarity data search, utilization of similarity data, and load balancing across computing banks [7][9][11] Key Technologies 1. Hotspot-based Similarity Data Search: A hotspot mechanism is used to quickly identify representative data within DRAM banks, reducing power and time overhead [7] 2. Progressive Sparse Computing Unit: A sparse detection mechanism skips redundant data blocks, improving computational efficiency by focusing on non-zero data [9] 3. Load Balancing Mechanism: Task loads are dynamically redistributed across computing banks based on data similarity, ensuring balanced performance [11] Performance Improvements - The 3D computing-memory architecture achieves 3.34x to 9.61x improvement in computational throughput and 5.69x to 28.13x improvement in energy efficiency compared to existing high-performance AI chips [13][14][15] - The architecture also improves area efficiency by 3.82x to 10.98x [13]