Workflow
Tile编程
icon
Search documents
刚刚,英伟达CUDA迎来史上最大更新!
具身智能之心· 2025-12-08 01:11
Core Insights - NVIDIA has officially released CUDA Toolkit 13.1, marking it as the largest update in 20 years [2][4]. Group 1: CUDA Tile - CUDA Tile is the most significant update in NVIDIA CUDA Toolkit 13.1, introducing a tile-based programming model that allows developers to write algorithms at a higher abstraction level [4][5]. - The CUDA Tile model enables developers to specify data blocks called "Tiles" and define mathematical operations on them, allowing the compiler and runtime to optimally distribute workloads across threads [8][15]. - This model abstracts the details of specialized hardware like Tensor Cores, ensuring compatibility with future GPU architectures [9][15]. - CUDA 13.1 includes two components for Tile programming: CUDA Tile IR, a new virtual instruction set architecture, and cuTile Python, a domain-specific language for writing array and Tile-based kernel functions in Python [10]. Group 2: Green Context Support - The update introduces runtime support for Green Contexts, which are lightweight contexts that allow finer-grained GPU resource allocation [20][21]. - Green Contexts enable users to define and manage independent partitions of GPU resources, enhancing the ability to prioritize tasks based on latency sensitivity [21]. Group 3: Multi-Process Service (MPS) Updates - CUDA 13.1 brings several new features to MPS, including Memory Locality Optimization Partition (MLOPart), which allows users to create CUDA devices optimized for memory locality [24][25]. - MLOPart devices are derived from the same physical GPU but present as multiple independent devices with reduced computational resources [25][26]. - Static Streaming Multiprocessor (SM) partitioning is introduced as an alternative to dynamic resource provisioning, providing deterministic resource allocation for MPS clients [29]. Group 4: Developer Tools Enhancements - The release includes performance analysis tools for CUDA Tile kernel functions, enhancing the ability to analyze Tile statistics [33]. - NVIDIA Compute Sanitizer has been updated to support compile-time patching, improving memory error detection capabilities [34]. - New features in NVIDIA Nsight Systems include enhanced tracing capabilities for CUDA applications, allowing for better performance analysis [37]. Group 5: Core CUDA Libraries Updates - CUDA 13.1 introduces performance updates for cuBLAS on the Blackwell architecture, including support for block-scaled FP4 and FP8 matrix multiplication [40]. - The cuSOLVER library has been optimized for batch processing of eigenvalue problems, achieving significant performance improvements [42].