刚刚，英伟达CUDA迎来史上最大更新！

Core Insights - NVIDIA has officially released CUDA Toolkit 13.1, marking the largest update in 20 years since the inception of the CUDA platform in 2006 [2] - The update introduces CUDA Tile, a new programming model that allows developers to write algorithms at a higher abstraction level, simplifying the use of specialized hardware like Tensor Cores [4][5] Summary by Sections CUDA Tile - CUDA Tile is the central update in NVIDIA CUDA Toolkit 13.1, enabling developers to abstract specialized hardware details and write GPU kernel functions at a higher level than the traditional SIMT (Single Instruction Multiple Threads) model [4][6] - The Tile model allows developers to specify data blocks called "Tiles" and the mathematical operations to be performed on them, with the compiler automatically managing workload distribution across threads [7][8] - CUDA 13.1 includes two components for Tile programming: CUDA Tile IR, a new virtual instruction set architecture, and cuTile Python, a domain-specific language for writing array and Tile-based kernel functions in Python [9] Software Updates - The update introduces support for Green Contexts, which are lightweight contexts that allow for finer-grained GPU resource allocation and management [19][20] - CUDA 13.1 also features a customizable split() API for building SM partitions and reducing false dependencies between different Green Contexts [21] - The Multi-Process Service (MPS) has been enhanced with memory locality optimization partitions (MLOPart) and static SM partitioning for improved resource allocation and isolation [23][28] Developer Tools - New developer tools include performance analysis tools for CUDA Tile kernel functions and enhancements to Nsight Compute for better analysis of Tile statistics [32] - The NVIDIA Compute Sanitizer has been updated to support compile-time patching for improved memory error detection [33] Mathematical Libraries - The core CUDA toolkit's mathematical libraries have received performance updates for the new Blackwell architecture, including enhancements to cuBLAS and cuSOLVER for better matrix operations [37][41] - New APIs have been introduced for cuBLAS and cuSPARSE, providing improved performance for specific operations [40][46]