英伟达 H100

Search documents
晶圆级芯片,是未来
3 6 Ke· 2025-06-29 23:49
Group 1: Industry Overview - The computational power required for large AI models has increased by 1000 times in just two years, significantly outpacing hardware iteration speeds [1] - Current AI training hardware is divided into two main camps: dedicated accelerators using wafer-level integration technology and traditional GPU clusters [1][2] Group 2: Wafer-Level Chips - Wafer-level chips are seen as a breakthrough, allowing for the integration of multiple dies on a single wafer, which enhances bandwidth and reduces latency [3][4] - The size of a single die chip is approximately 858 mm², and the maximum size is constrained by the exposure window [2][3] Group 3: Key Players - Cerebras has developed the WSE-3 wafer-level chip, which utilizes TSMC's 5nm process, featuring 4 trillion transistors and 900,000 AI cores [5][6] - Tesla's Dojo chip employs a different approach, integrating 25 proprietary D1 chips on a wafer, achieving 9 Petaflops of computing power [10][11] Group 4: Performance Comparison - WSE-3 can train models 10 times larger than GPT-4 and Gemini, with a peak performance of 125 PFLOPS [8][14] - In comparison, the WSE-3 has 880 times the on-chip memory capacity and 7000 times the memory bandwidth of the NVIDIA H100 [8][13] Group 5: Cost and Scalability - The cost of Tesla's Dojo system is estimated between $300 million to $500 million, while Cerebras WSE systems range from $2 million to $3 million [18][19] - NVIDIA GPUs, while cheaper initially, face long-term operational cost issues due to high energy consumption and performance bottlenecks [18][19] Group 6: Future Outlook - The wafer-level chip architecture is considered the highest integration density for computing nodes, indicating significant potential for future developments in AI training hardware [20]