Workflow
maxDNN
icon
Search documents
CUDA内核之神、全球最强GPU程序员?OpenAI的这位幕后大神是谁
机器之心· 2025-09-30 23:49
Core Insights - The article emphasizes the importance of behind-the-scenes engineers in AI, highlighting that a great team consists of both star figures and key contributors [1][2]. Group 1: Scott Gray's Role and Skills - Scott Gray, a senior engineer at OpenAI, gained attention for writing a critical CUDA Kernel that supports trillions of computations daily [3][5]. - Writing high-performance CUDA Kernels requires expertise in parallel computing, GPU architecture, and deep learning algorithms, making such talent rare [7]. - Gray's career path is tailored for performance engineering, focusing on low-level optimizations rather than being a typical "genius" scientist [7][8]. Group 2: Achievements at Nervana - Gray's reputation in AI began at Nervana Systems, where he addressed the efficiency gap between software frameworks and hardware during the deep learning boom [14]. - He developed maxas, an assembler that allows direct interaction with hardware, enabling the writing of highly optimized computational kernels [17][18]. - Using maxas, Gray achieved a SGEMM kernel that reached 98% of the theoretical peak efficiency on the GM204 GPU, outperforming NVIDIA's cuBLAS by 4.8% [20]. Group 3: Innovations in Deep Learning - Building on maxas, Gray created maxDNN, which applied low-level optimization techniques to convolution operations, significantly surpassing NVIDIA's cuDNN in performance [21]. - In AlexNet's convolution layers, maxDNN achieved 93-95% computational efficiency, while cuDNN fluctuated between 32% and 57% [21]. Group 4: Contributions at OpenAI - After joining OpenAI, Gray shifted focus to developing tools for efficient sparse model architectures, becoming a key figure in implementing Scaling Laws [22]. - He co-developed innovative block-sparse GPU kernels that significantly enhance efficiency by skipping zero-value blocks during computation [24][25]. - These kernels allow researchers to build larger neural network models within fixed computational budgets, achieving state-of-the-art results in various tasks [26][27].