GPU加速计算平台
Search documents
英伟达投资新思,重塑芯片格局
半导体行业观察· 2025-12-02 01:37
半导体生态系统的两大巨头英伟达 和 Synopsys 宣布达成一项里程碑式的多年战略合作,英伟达将向 Synopsys 主导的项目投资 20 亿美元。此次合作旨在将英伟达的 GPU 加速计算平台与 Synopsys 业 界领先的电子设计自动化 (EDA) 和半导体 IP 产品组合相融合,从而显著加快芯片设计周期、大幅 降低功耗,并助力下一代人工智能、汽车和高性能计算芯片的研发。 此次合作的核心在于创建一个统一的云原生设计环境,该环境整合了 Synopsys 的 TestMAX、Verdi 和 VC Formal 工具、NVIDIA 的 cuLitho 计算光刻平台以及更广泛的 Grace-Blackwell 软件栈。设 计人员将首次能够以 GPU 加速的速度运行全芯片布局布线、设计规则检查和电磁仿真,速度比传统 的基于 CPU 的流程快 10 到 50 倍。在发布会上分享的早期基准测试表明,一款 3nm AI 训练芯片此 前需要 12 周的计算时间才能完成的相同签核流程,在运行 Synopsys 工具的 NVIDIA DGX 云实例 上仅需不到 60 小时即可完成。 英伟达首席执行官黄仁勋将此次合作描述为"自 ...
直击WAIC丨如何缓解AI训练“效率瓶颈”?摩尔线程张建中:打造AGI“超级工厂”
Xin Lang Ke Ji· 2025-07-27 04:12
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC 2025) is being held in Shanghai from July 26 to 28, where the concept of "AI Factory" was introduced by Moore Threads [1][3] - The CEO of Moore Threads, Zhang Jianzhong, emphasized the need for innovative engineering solutions to address the efficiency bottlenecks in large model training due to the explosive growth of generative AI [1][3] Group 1: AI Factory Concept - The "AI Factory" is likened to the process upgrades in chip wafer fabs, requiring innovations in chip architecture, overall cluster architecture optimization, software algorithm tuning, and resource scheduling system upgrades [3] - The efficiency of the AI Factory is determined by five core elements, summarized in the formula: AI Factory Production Efficiency = Accelerated Computing Generality × Single Chip Effective Computing Power × Single Node Efficiency × Cluster Efficiency × Cluster Stability [3] Group 2: Technological Innovations - Moore Threads' GPU single chip, based on the MUSA architecture, integrates AI computing acceleration, graphics rendering, physical simulation, and ultra-high-definition video encoding capabilities, supporting a full precision spectrum from FP64 to INT8 [3] - The use of FP8 mixed precision technology in mainstream large model training has resulted in a performance increase of 20% to 30% [3] Group 3: Memory and Communication Efficiency - The memory system of Moore Threads achieves a 50% bandwidth saving and a 60% reduction in latency through various technologies, including multi-precision near-memory reduction engines and low-latency Scale-Up [4] - The ACE asynchronous communication engine reduces computational resource loss by 15%, while the MTLink 2.0 interconnect technology provides 60% higher bandwidth than the domestic industry average, laying a solid foundation for large-scale cluster deployment [4] Group 4: Reliability and Fault Tolerance - The introduction of zero-interruption fault tolerance technology allows for the isolation of affected node groups during hardware failures, enabling uninterrupted training for the remaining nodes [4] - This innovation results in an effective training time ratio exceeding 99% for the KUAE cluster, significantly reducing recovery costs [4]