Workflow
大模型微调
icon
Search documents
2张4090竟能本地微调万亿参数Kimi K2!趋境联合清华北航把算力门槛击穿了
量子位· 2025-11-05 07:56
Core Insights - The article discusses the significant reduction in the cost and complexity of fine-tuning large language models, enabling the use of consumer-grade GPUs for models like DeepSeek 671B and Kimi K2 1TB [1][5][12]. Group 1: Cost Reduction and Technological Advancements - Fine-tuning large models previously required massive GPU resources, with models like Kimi K2 needing up to 2000GB of VRAM, while now only 2-4 consumer-grade GPUs (e.g., 4090) are sufficient [3][4]. - The key to this cost reduction comes from two domestic projects: KTransformers and LLaMA-Factory, which have made significant advancements in model training and fine-tuning [5][6][7]. - KTransformers allows for fine-tuning large models with significantly lower VRAM requirements, needing only around 90GB for Kimi K2 and 70GB for DeepSeek 671B [7][12]. Group 2: Performance and Efficiency - KTransformers has been shown to outperform other frameworks in terms of throughput and memory usage for fine-tuning tasks, making it a viable option for personal workstations [12][13]. - The integration of KTransformers with LLaMA-Factory simplifies the fine-tuning process, allowing users to manage data processing and training without extensive coding knowledge [9][30]. Group 3: Practical Applications and Customization - The article highlights the potential for personalized AI models, enabling users to fine-tune models for specific styles or industry needs, thus democratizing access to advanced AI technologies [24][26]. - Companies can leverage KTransformers to create specialized AI models tailored to their business needs, enhancing efficiency and return on investment [27][28]. Group 4: Technical Innovations - KTransformers employs innovative techniques such as offloading memory-intensive tasks to CPUs and integrating LoRA for efficient fine-tuning, significantly reducing the memory footprint of large models [36]. - The collaboration between KTransformers and LLaMA-Factory represents a strong synergy that enhances both performance and usability in the fine-tuning landscape [32][33].
苹果提出新型反向传播:一台iPhone 15 Pro Max就能微调LLM
机器之心· 2025-10-30 01:41
编辑:Panda 机器之心报道 用 iPhone 本地跑大模型已经不是新鲜事了,但能不能在 iPhone 上微调模型呢? 最近,苹果亲自上场,用一篇论文展示了其可行性。在这篇论文中,苹果提出了一种 内存高效型反向传播(MeBP) 。该方法可在内存使用量和计算时间之间提 供比零阶优化(ZO/zeroth-order optimization)更好的权衡,同时还比 ZO 基线收敛更快、性能更优。他们还在 iPhone 15 Pro Max 上验证了 MeBP 的有效性。 这个苹果团队(宋丛峥与 Xinyu Tang)也在论文中表示会发布一个 MeBP 实现,但其公开的链接目前还空无一码。 论文标题:Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices 内存高效型反向传播(MeBP) 在这篇论文中,苹果团队的研究重点是使用 LoRA 微调 LLM。因此,主要的内存瓶颈在于模型参数和中间激活值。该团队的目标是将微调的内存使用量保持在现 代移动设备可接受的范围内,例如 PocketLLM ...
Murati翁荔陈丹琦公司发布首个产品,让大模型微调门槛暴降,要重新发明一个OpenAI
量子位· 2025-10-02 03:26
Core Insights - Thinking Machines Lab has launched its first product, Tinker, which simplifies model fine-tuning to the level of modifying Python code [1][12] - The company has moved past the "zero product, zero revenue" valuation of $84 billion [2] Product Overview - Tinker is a flexible API designed for fine-tuning language models, allowing researchers to control algorithms and data without managing infrastructure [12][13] - The initial support for Tinker includes Qwen3 and Llama3 series models, enabling easy switching between small and large models with a simple string modification in Python code [15] - Tinker’s API automates low-level training steps while handling scheduling, scaling, and error recovery [17] Technical Features - Tinker utilizes LoRA to allow multiple training tasks to share the same GPU, reducing costs and enabling more parallel experiments [22] - The gradient update strategy for Tinker is defined as: New parameters = Original parameters + Learning rate × Advantage value × Gradient of log probability [28] Industry Reception - Tinker has garnered significant attention in the industry, with beta testers noting its excellent balance between abstraction and tunability compared to other fine-tuning tools [30] - Research teams from prestigious institutions have already achieved notable results using Tinker [30] Strategic Vision - Thinking Machines Lab aims to reinvent a version of OpenAI that emphasizes open research sharing and greater freedom for researchers [10][11] - The company’s mission aligns with making cutting-edge models more accessible for customization based on individual needs [14]
大模型微调到底有没有技术含量,或者说技术含量到底有多大?
自动驾驶之心· 2025-08-10 23:32
Core Viewpoint - The article emphasizes the importance of individual approaches and methodologies in the field of large language models (LLMs), particularly in the context of fine-tuning and data quality, suggesting that the technical depth of work in this area is highly dependent on personal engagement and practices [5][16]. Data Work - Method 1 involves inheriting training data from colleagues without checking data quality, which may lead to suboptimal results [7]. - Method 2 suggests downloading open-source data to create a "system + query + answer" dataset [8]. - Method 3 focuses on generating data using GPT-4, emphasizing the diversity of prompts and the importance of data quality checks [8]. - Method 4 advocates using user interaction logs to drive data construction, analyzing user feedback to improve answer quality [9]. - Method 5 recommends breaking down complex tasks at the data level to enhance model performance [9]. Training Code - Method 1 involves inheriting training code and making minimal modifications [11]. - Method 2 encourages a thorough understanding of training code parameters and their implications [11]. - Method 3 promotes questioning and improving training code, such as optimizing speed and framework choices [12]. Experimental Analysis - Method 1 suggests running prepared evaluation sets and addressing data quality issues based on results [14]. - Method 2 involves analyzing bad cases from models to identify underlying issues and designing experiments to validate findings [14]. - Method 3 emphasizes the relationship between model results, data quality, and training methods, advocating for a comprehensive analysis of training logs and evaluation results [15]. Community and Collaboration - The article highlights the establishment of a large community focused on various aspects of autonomous driving technology, including large models and multi-sensor fusion, with nearly 4,000 members and over 300 companies and research institutions involved [18].