Workflow
ReLU
icon
Search documents
经典ReLU回归!重大缺陷「死亡ReLU问题」已被解决
机器之心· 2025-06-03 06:26
Core Viewpoint - The article discusses the introduction of the SUGAR model, which enhances the performance of ReLU activation functions in deep learning without changing the model architecture or increasing parameters [2][3]. Group 1: SUGAR Model Introduction - SUGAR (Surrogate Gradient for ReLU) addresses the limitations of the ReLU activation function, particularly the "dying ReLU" problem, by maintaining the standard ReLU for forward propagation while using a non-zero surrogate gradient for backpropagation [3][4]. - The method allows for the revival of dead neurons while preserving the advantages of ReLU, such as sparsity and simplicity [4]. Group 2: New Surrogate Gradient Functions - Two new surrogate gradient functions, B-SiLU (Bounded SiLU) and NeLU (Negative slope Linear Unit), have been designed to integrate seamlessly into various models [5][6]. - B-SiLU combines self-gating properties with a tunable lower bound parameter, while NeLU serves as a smooth derivative alternative to ReLU [13][14]. Group 3: Experimental Results - The implementation of SUGAR with B-SiLU resulted in significant accuracy improvements: VGG-16's accuracy increased by 10 percentage points on CIFAR-10 and 16 percentage points on CIFAR-100, while ResNet-18 saw increases of 9 and 7 percentage points, respectively [6][18]. - In experiments, B-SiLU outperformed other activation functions, with ResNet-18's accuracy rising from 76.76% to 86.42% on CIFAR-10 and from 48.99% to 56.51% on CIFAR-100 [16][18]. Group 4: Broader Implications - SUGAR has been evaluated on modern architectures like Swin Transformer and Conv2NeXt, demonstrating its adaptability and effectiveness across different models [9][22]. - The findings suggest that SUGAR can enhance the generalization capabilities of various neural network architectures, making it a valuable addition to deep learning methodologies [9].