稀疏CUDA代码生成 - filings, earnings calls, financial reports, news

稀疏CUDA代码生成

Search documents

ICLR 2026 Oral｜中科院团队提出新框架「SparseRL」，深度强化学习可自动生成高性能CUDA代码

机器之心· 2026-03-25 07:01

Core Insights - The article discusses the introduction of a new framework called SparseRL by a team from the Chinese Academy of Sciences, which integrates deep reinforcement learning into the task of generating sparse CUDA code, aiming to optimize code performance based on the structure of sparse matrices [2][5]. Group 1: Framework and Methodology - SparseRL enhances the compilation success rate by 20% and execution speed by 30% in classic SpMV tasks [3][16]. - The framework employs a pre-trained language model as a policy network, where each token generation represents an action, and the compilation results and execution times serve as reward signals [12][18]. - The training process consists of three stages: pre-training on CUDA code, supervised fine-tuning with sparse matrix-code pairs, and reinforcement learning optimization focusing on both correctness and efficiency [18][20]. Group 2: Innovations and Challenges - A key innovation is the use of sinusoidal position embeddings to help the model understand the spatial relationships of non-zero elements in sparse matrices, akin to positional encoding in Transformers [13][14]. - The hierarchical reward function balances correctness and efficiency, ensuring that the generated code is both functional and performant [14][17]. - The method faces challenges such as high computational costs for reinforcement learning training, the need for retraining on new hardware architectures, and potential lack of human-like coding style and interpretability in generated code [20]. Group 3: Significance and Future Directions - SparseRL signifies a paradigm shift from generating merely runnable code to producing high-performance code, suggesting a new potential for AI in handling performance optimization tasks [22]. - Future plans include extending the method to multi-GPU distributed sparse computing, exploring integration with traditional AutoTuning techniques, and supporting a wider range of sparse operators [22].