Workflow
GB202
icon
Search documents
深挖英伟达Blackwell
半导体行业观察· 2025-06-30 01:52
Core Insights - Nvidia's latest GPU architecture, Blackwell, features the largest chip, GB202, with a die size of 750 mm² and 92.2 billion transistors, designed for high performance in graphics processing [1][62] - The RTX PRO 6000 Blackwell configuration is the most powerful in Nvidia's lineup, comparable to the RTX 5090 but with more stream multiprocessors (SMs) enabled [1][2] Architecture and Performance - The GB202 chip has 192 SMs, which are the fundamental building blocks of Nvidia GPUs, and utilizes a large memory subsystem to enhance performance [1][4] - Blackwell's SM to GPC ratio is 1:16, allowing for cost-effective scaling of SMs without increasing GPC-level hardware [5] - Compared to AMD's RDNA4 architecture, which has a 1:8 SE:WGP ratio, Blackwell's design allows for higher clock speeds and potentially greater throughput [6][18] Instruction and Execution - Blackwell uses fixed-length 128-bit instructions and a two-level instruction cache, improving instruction bandwidth and performance [7][10] - The architecture allows for overlapping different types of workloads in the same queue, enhancing efficiency in shader array utilization [8][23] Memory Subsystem - Blackwell features a 128 KB memory block divided into L1 cache and shared memory, maintaining low latency and high throughput [25][35] - The L2 cache latency is slightly higher than previous generations, but the overall memory bandwidth remains superior to AMD's offerings [49][53] Competitive Landscape - Nvidia's RTX PRO 6000 Blackwell outperforms AMD's RX 9070 in various benchmarks, particularly in memory bandwidth and computational performance [58][61] - The competition in the GPU market is intensifying, with Intel's upcoming Battlemage and AMD's RDNA4 targeting mid-range markets, while Nvidia continues to dominate the high-end segment [61][64]