内部协变量偏移

Search documents
一篇被证明“理论有误”的论文,拿下了ICML2025时间检验奖
猿大侠· 2025-07-17 03:11
Core Viewpoint - The Batch Normalization paper, published in 2015, has been awarded the Time-Tested Award at ICML 2025, highlighting its significant impact on deep learning and its widespread adoption in the field [1][2]. Group 1: Impact and Significance - The Batch Normalization paper has been cited over 60,000 times, marking it as a milestone in the history of deep learning [2][4]. - It has been a key technology that enabled deep learning to transition from small-scale experiments to large-scale practical applications [3][4]. - The introduction of Batch Normalization has drastically accelerated the training of deep neural networks, allowing models to achieve the same accuracy with significantly fewer training steps [13][14]. Group 2: Challenges Addressed - In 2015, deep learning faced challenges with training deep neural networks, which became unstable as the number of layers increased [5][6]. - Researchers identified that the internal data distribution of network nodes changed during training, leading to difficulties in model training [11][12]. - Batch Normalization addresses this issue by normalizing the data distribution of hidden layers, thus stabilizing the training process [12][14]. Group 3: Theoretical Developments - Initial theories surrounding Batch Normalization were challenged in 2018, revealing that it not only accelerated training but also made the optimization landscape smoother, enhancing gradient predictability and stability [22][24]. - New research suggests that Batch Normalization functions as an unsupervised learning technique, allowing networks to adapt to the inherent structure of data from the start of training [25][26]. Group 4: Authors' Current Endeavors - The authors of the Batch Normalization paper, Sergey Ioffe and Christian Szegedy, have continued their careers in AI, with Szegedy joining xAI and Ioffe following suit [30][31]. - Szegedy has since moved to Morph Labs, focusing on achieving "verifiable superintelligence" [33].