大模型微调
Search documents
2张4090竟能本地微调万亿参数Kimi K2!趋境联合清华北航把算力门槛击穿了
量子位· 2025-11-05 07:56
Core Insights - The article discusses the significant reduction in the cost and complexity of fine-tuning large language models, enabling the use of consumer-grade GPUs for models like DeepSeek 671B and Kimi K2 1TB [1][5][12]. Group 1: Cost Reduction and Technological Advancements - Fine-tuning large models previously required massive GPU resources, with models like Kimi K2 needing up to 2000GB of VRAM, while now only 2-4 consumer-grade GPUs (e.g., 4090) are sufficient [3][4]. - The key to this cost reduction comes from two domestic projects: KTransformers and LLaMA-Factory, which have made significant advancements in model training and fine-tuning [5][6][7]. - KTransformers allows for fine-tuning large models with significantly lower VRAM requirements, needing only around 90GB for Kimi K2 and 70GB for DeepSeek 671B [7][12]. Group 2: Performance and Efficiency - KTransformers has been shown to outperform other frameworks in terms of throughput and memory usage for fine-tuning tasks, making it a viable option for personal workstations [12][13]. - The integration of KTransformers with LLaMA-Factory simplifies the fine-tuning process, allowing users to manage data processing and training without extensive coding knowledge [9][30]. Group 3: Practical Applications and Customization - The article highlights the potential for personalized AI models, enabling users to fine-tune models for specific styles or industry needs, thus democratizing access to advanced AI technologies [24][26]. - Companies can leverage KTransformers to create specialized AI models tailored to their business needs, enhancing efficiency and return on investment [27][28]. Group 4: Technical Innovations - KTransformers employs innovative techniques such as offloading memory-intensive tasks to CPUs and integrating LoRA for efficient fine-tuning, significantly reducing the memory footprint of large models [36]. - The collaboration between KTransformers and LLaMA-Factory represents a strong synergy that enhances both performance and usability in the fine-tuning landscape [32][33].
苹果提出新型反向传播:一台iPhone 15 Pro Max就能微调LLM
机器之心· 2025-10-30 01:41
Core Viewpoint - Apple has demonstrated the feasibility of fine-tuning large language models (LLMs) on iPhones using a new method called Memory-Efficient Backpropagation (MeBP), which offers better trade-offs between memory usage and computation time compared to existing methods [1][4]. Summary by Sections Introduction - The article discusses Apple's recent paper on MeBP, which allows for model fine-tuning on resource-constrained mobile devices like the iPhone 15 Pro Max [1][3]. Methodology - MeBP focuses on using LoRA for fine-tuning LLMs, aiming to keep memory usage below 1GB, as recommended by PocketLLM [4]. - The fine-tuning process using MeBP consists of three main steps: compressing base model weights, implementing gradient checkpointing, and creating an efficient runtime for executing the training graph [5][10]. Model Weight Compression - The team employed 4-bit symmetric INT4 quantization for non-LoRA parameters, including embeddings, to reduce disk space usage [7][10]. Gradient Checkpointing - The LLM is divided into blocks to ensure that memory consumption during backpropagation remains within device limits. Automatic differentiation is used to generate a backward graph for each block [8][9]. Runtime Implementation - The MeBP runtime is designed to minimize memory usage by memory-mapping compressed model weights and only decompressing them on demand during training [15][16]. Experimental Performance - The team compared MeBP with MeZO, the only known optimization method for mobile LLM fine-tuning, using server-side simulations and performance evaluations on mobile devices [18][24]. - The experiments were conducted on models with parameters ranging from 0.5B to 4B, focusing on loss and next token accuracy as evaluation metrics [20]. Utility Comparison - Results indicated that while zero-order (ZO) optimization showed slower convergence compared to first-order (FO) optimization, MeBP significantly outperformed ZO in terms of convergence speed and computational efficiency [23]. Performance Comparison - MeBP was implemented in Swift on an iPhone 15 Pro Max with 8GB RAM, showing that MeBP's computation time per gradient step was 43% to 94% longer than MeZO, but it converged faster overall due to fewer required steps [24][28]. - The memory usage of MeBP was slightly higher than MeZO in the worst case, but overall training memory usage was approximately 10 times smaller than previous mobile implementations [28]. Conclusion - All tested LLMs could be efficiently fine-tuned within 1GB of memory, making them suitable for background training on mobile devices [28].
Murati翁荔陈丹琦公司发布首个产品,让大模型微调门槛暴降,要重新发明一个OpenAI
量子位· 2025-10-02 03:26
Core Insights - Thinking Machines Lab has launched its first product, Tinker, which simplifies model fine-tuning to the level of modifying Python code [1][12] - The company has moved past the "zero product, zero revenue" valuation of $84 billion [2] Product Overview - Tinker is a flexible API designed for fine-tuning language models, allowing researchers to control algorithms and data without managing infrastructure [12][13] - The initial support for Tinker includes Qwen3 and Llama3 series models, enabling easy switching between small and large models with a simple string modification in Python code [15] - Tinker’s API automates low-level training steps while handling scheduling, scaling, and error recovery [17] Technical Features - Tinker utilizes LoRA to allow multiple training tasks to share the same GPU, reducing costs and enabling more parallel experiments [22] - The gradient update strategy for Tinker is defined as: New parameters = Original parameters + Learning rate × Advantage value × Gradient of log probability [28] Industry Reception - Tinker has garnered significant attention in the industry, with beta testers noting its excellent balance between abstraction and tunability compared to other fine-tuning tools [30] - Research teams from prestigious institutions have already achieved notable results using Tinker [30] Strategic Vision - Thinking Machines Lab aims to reinvent a version of OpenAI that emphasizes open research sharing and greater freedom for researchers [10][11] - The company’s mission aligns with making cutting-edge models more accessible for customization based on individual needs [14]
大模型微调到底有没有技术含量,或者说技术含量到底有多大?
自动驾驶之心· 2025-08-10 23:32
Core Viewpoint - The article emphasizes the importance of individual approaches and methodologies in the field of large language models (LLMs), particularly in the context of fine-tuning and data quality, suggesting that the technical depth of work in this area is highly dependent on personal engagement and practices [5][16]. Data Work - Method 1 involves inheriting training data from colleagues without checking data quality, which may lead to suboptimal results [7]. - Method 2 suggests downloading open-source data to create a "system + query + answer" dataset [8]. - Method 3 focuses on generating data using GPT-4, emphasizing the diversity of prompts and the importance of data quality checks [8]. - Method 4 advocates using user interaction logs to drive data construction, analyzing user feedback to improve answer quality [9]. - Method 5 recommends breaking down complex tasks at the data level to enhance model performance [9]. Training Code - Method 1 involves inheriting training code and making minimal modifications [11]. - Method 2 encourages a thorough understanding of training code parameters and their implications [11]. - Method 3 promotes questioning and improving training code, such as optimizing speed and framework choices [12]. Experimental Analysis - Method 1 suggests running prepared evaluation sets and addressing data quality issues based on results [14]. - Method 2 involves analyzing bad cases from models to identify underlying issues and designing experiments to validate findings [14]. - Method 3 emphasizes the relationship between model results, data quality, and training methods, advocating for a comprehensive analysis of training logs and evaluation results [15]. Community and Collaboration - The article highlights the establishment of a large community focused on various aspects of autonomous driving technology, including large models and multi-sensor fusion, with nearly 4,000 members and over 300 companies and research institutions involved [18].