X @Avi Chawla
Avi Chawla·2026-02-08 09:15
  1. Use momentumIn gradient descent, every parameter update solely depends on the current gradient. This leads to unwanted oscillations during optimization.Momentum reduces this by adding a weighted average of previous gradient updates to the update rule.Check this 👇 https://t.co/77X9rwRyOF ...