Batch Normalization(批次归一化)

Search documents
一篇被证明“理论有误”的论文,拿下了ICML2025时间检验奖
量子位· 2025-07-15 08:31
Core Insights - The Batch Normalization paper, published in 2015, has been awarded the Time-Tested Award at ICML 2025, highlighting its significant impact on deep learning [1] - With over 60,000 citations, this work is considered a milestone in the development of deep learning, facilitating the training and application of deep neural networks [2][4] - Batch Normalization is a key technology that enabled deep learning to transition from small-scale experiments to large-scale practical applications [3] Group 1 - In 2015, deep learning faced challenges in training deep neural networks, which were often unstable and sensitive to parameter initialization [5][6][7] - Researchers Sergey Ioffe and Christian Szegedy identified the issue of Internal Covariate Shift, where the distribution of data within the network changes during training, complicating the training process [8][11] - Their solution involved normalizing the data at each layer, similar to input layer normalization, which significantly improved training speed and stability [12] Group 2 - The original paper demonstrated that using Batch Normalization allowed advanced image classification models to achieve the same accuracy with only 1/14 of the training steps [13] - Batch Normalization not only accelerated training but also introduced a regularization effect, enhancing the model's generalization ability [14][15] - Following its introduction, Batch Normalization became foundational for many mainstream convolutional neural networks, such as ResNet and DenseNet [18] Group 3 - In 2018, a paper from MIT challenged the core theory of Batch Normalization, showing that even with introduced noise, models with Batch Normalization still trained faster than those without it [21][23] - This research revealed that Batch Normalization smooths the Optimization Landscape, making gradient behavior more predictable and stable [24] - It was suggested that Batch Normalization acts as an unsupervised learning technique, allowing networks to adapt to the data's inherent structure early in training [25] Group 4 - Recent studies have provided deeper insights into Batch Normalization from a geometric perspective [29] - Both authors, Ioffe and Szegedy, have continued their careers in AI, with Szegedy joining xAI and Ioffe following suit [30][32] - Szegedy has since transitioned to a new role at Morph Labs, focusing on achieving "verifiable superintelligence" [34]