Dropout
Search documents
X @Avi Chawla
Avi Chawla· 2025-09-22 19:59
Dropout Mechanism - During training, the average neuron input is significantly lower compared to inference, potentially causing numerical instability due to activation scale misalignment [1] - Dropout addresses this by multiplying inputs during training by a factor of 1/(1-p), where 'p' is the dropout rate [2] - For example, with a dropout rate of 50%, an input of 50 is scaled to 100 (50 / (1 - 0.5) = 100) [2] - This scaling ensures coherence between training and inference stages for the neural network [2] Training vs Inference - Consider a layer with 100 neurons, each with an activation value of 1, and a weight of 1 from each neuron to neuron 'A' in the next layer [2] - With a 50% dropout rate, approximately 50 neurons are active during training [2] - During inference, all 100 neurons are active since Dropout is not used [2]
X @Avi Chawla
Avi Chawla· 2025-09-22 06:39
Here's a hidden detail about Dropout that many people don't know.Assume that:- There are 100 neurons in a layer, and all activation values are 1.- The weight from 100 neurons to a neuron ‘A’ in the next layer is 1.- Dropout rate = 50%Computing the input of neuron ‘A’:- During training → Approx. 50 (since ~50% of values will be dropped).- During inference → 100 (since we don't use Dropout during inference).So essentially, during training, the average neuron input is significantly lower than that during infer ...
大模型“记性差一点”反而更聪明,金鱼损失随机剔除token,让AI不再死记硬背
3 6 Ke· 2025-09-03 23:54
大语言模型如果不加约束,很容易把训练数据原封不动地复刻出来。为解决这个问题,来自马里兰大学、图宾根大学和马普所的研究团队提出了一个新方 法——金鱼损失(Goldfish Loss)。 训练大模型时,有时让它"记性差一点",反而更聪明! 顾名思义,金鱼损失就是让模型像金鱼一样,不去死记每一个细节,而是在损失函数计算时随机剔除一小部分token。 由此,模型不再逐字记住训练集内容,但仍能学会语言规律。 实验显示,LLaMA-2在使用金鱼损失后: 用网友的精辟评论概括就是:dropout,但损失函数! 在梯度计算中随机屏蔽部分token 金鱼损失的核心理念非常简单,就是在模型训练过程中随机剔除一部分训练文本中的tokens,使其不参与损失计算。 这样一来,当模型在推理阶段遇到这些位置时,就只能"猜测",而不是逐字逐句复现训练数据的完整序列。 $\mathcal{L}_{\text{goldfish}}(\theta)=-\frac{1}{|G|}\sum_{i=1}^{L}G_{i}(x_{i})\log P(x_{i}|x_{<i};\theta)$. 此外,为了保证被剔除token的一致性,研究人员设计了一种基 ...