物理可组合的分解架构 - filings, earnings calls, financial reports, news

物理可组合的分解架构

Search documents

半导体行业观察· 2025-09-17 01:30

Core Viewpoint - Memory latency, bandwidth, capacity, and energy consumption are increasingly becoming bottlenecks for performance improvement. The article proposes a shift from large, shared, homogeneous memory systems to smaller, tightly coupled memory segments that provide private local memory, significantly reducing access costs [2][4]. Group 1: Challenges in Memory Architecture - The idea of a large distributed memory address space is appealing but faces scalability and signaling challenges, which are physical limitations in modern engineering [4][5]. - Scaling capability has effectively reached its limit, with the cost per byte of SRAM and DRAM stabilizing, making large-capacity memory economically unfeasible [4][6]. - Signal transmission costs increase with distance, making remote memory access expensive and inefficient [5][12]. Group 2: Limitations of 2D Scaling - Traditional 2D scaling for SRAM and DRAM has reached its end, with costs per byte remaining stagnant for over a decade, leading to increased system costs [7][9]. - The growth of on-chip cache cannot keep pace with the expansion of chip area, necessitating more efficient storage resource utilization [9][14]. Group 3: Integration and Efficiency - Tighter integration enhances data transfer bandwidth and energy efficiency, as seen in the performance of L1, L2, and L3 caches [10][12]. - Modern DDR5 memory achieves a bandwidth of 358 Gbps, but the increase in core counts per slot has outpaced bandwidth improvements [10][12]. Group 4: Proposed Solutions - The article suggests a fundamental redesign of memory hierarchy, focusing on locality, bandwidth, and energy efficiency rather than raw capacity [14][15]. - The concept of "compute-memory nodes" integrates computing capabilities with private local memory, allowing for explicit management of data locality and sharing [14][15]. - DRAM is redefined as a capacity-driven storage layer for large working sets and cold data, while performance-critical access is managed through faster, on-package memory [15].