Core Viewpoint - Apple has demonstrated the feasibility of fine-tuning large language models (LLMs) on iPhones using a new method called Memory-Efficient Backpropagation (MeBP), which offers better trade-offs between memory usage and computation time compared to existing methods [1][4]. Summary by Sections Introduction - The article discusses Apple's recent paper on MeBP, which allows for model fine-tuning on resource-constrained mobile devices like the iPhone 15 Pro Max [1][3]. Methodology - MeBP focuses on using LoRA for fine-tuning LLMs, aiming to keep memory usage below 1GB, as recommended by PocketLLM [4]. - The fine-tuning process using MeBP consists of three main steps: compressing base model weights, implementing gradient checkpointing, and creating an efficient runtime for executing the training graph [5][10]. Model Weight Compression - The team employed 4-bit symmetric INT4 quantization for non-LoRA parameters, including embeddings, to reduce disk space usage [7][10]. Gradient Checkpointing - The LLM is divided into blocks to ensure that memory consumption during backpropagation remains within device limits. Automatic differentiation is used to generate a backward graph for each block [8][9]. Runtime Implementation - The MeBP runtime is designed to minimize memory usage by memory-mapping compressed model weights and only decompressing them on demand during training [15][16]. Experimental Performance - The team compared MeBP with MeZO, the only known optimization method for mobile LLM fine-tuning, using server-side simulations and performance evaluations on mobile devices [18][24]. - The experiments were conducted on models with parameters ranging from 0.5B to 4B, focusing on loss and next token accuracy as evaluation metrics [20]. Utility Comparison - Results indicated that while zero-order (ZO) optimization showed slower convergence compared to first-order (FO) optimization, MeBP significantly outperformed ZO in terms of convergence speed and computational efficiency [23]. Performance Comparison - MeBP was implemented in Swift on an iPhone 15 Pro Max with 8GB RAM, showing that MeBP's computation time per gradient step was 43% to 94% longer than MeZO, but it converged faster overall due to fewer required steps [24][28]. - The memory usage of MeBP was slightly higher than MeZO in the worst case, but overall training memory usage was approximately 10 times smaller than previous mobile implementations [28]. Conclusion - All tested LLMs could be efficiently fine-tuned within 1GB of memory, making them suitable for background training on mobile devices [28].
苹果提出新型反向传播:一台iPhone 15 Pro Max就能微调LLM