LLM Fine-tuning Techniques - Traditional fine-tuning is impractical for LLMs due to the large number of parameters (billions) and data size (hundreds of GBs), leading to the development of parameter-efficient fine-tuning (PEFT) [1] - PEFT techniques involve finding a lower-rank adaptation of LLM weight matrices [2] Specific PEFT Techniques - LoRA (Low-Rank Adaptation): Adds two low-rank trainable matrices (A and B) alongside weight matrices, adjusting updates in these low-rank matrices instead of fine-tuning the original weights, significantly reducing memory usage [3] - LoRA-FA (Frozen-A): Freezes matrix A in LoRA and only updates matrix B, further reducing activation memory requirements [4] - VeRA: Freezes matrices A and B, sharing them across all layers, and learns layer-specific scaling vectors instead [4] - Delta-LoRA: Tunes the original weight matrix W by adding the difference (delta) between the product of matrices A and B in two consecutive training steps [4][5] - LoRA+: Sets a higher learning rate for matrix B compared to matrix A in LoRA, resulting in better convergence [6]
X @Avi Chawla
Avi Chawla·2025-12-04 19:38