LoRA
Search documents
X @Avi Chawla
Avi Chawla· 2025-12-04 19:38
LLM Fine-tuning Techniques - Traditional fine-tuning is impractical for LLMs due to the large number of parameters (billions) and data size (hundreds of GBs), leading to the development of parameter-efficient fine-tuning (PEFT) [1] - PEFT techniques involve finding a lower-rank adaptation of LLM weight matrices [2] Specific PEFT Techniques - **LoRA (Low-Rank Adaptation):** Adds two low-rank trainable matrices (A and B) alongside weight matrices, adjusting updates in these low-rank matrices instead of fine-tuning the original weights, significantly reducing memory usage [3] - **LoRA-FA (Frozen-A):** Freezes matrix A in LoRA and only updates matrix B, further reducing activation memory requirements [4] - **VeRA:** Freezes matrices A and B, sharing them across all layers, and learns layer-specific scaling vectors instead [4] - **Delta-LoRA:** Tunes the original weight matrix W by adding the difference (delta) between the product of matrices A and B in two consecutive training steps [4][5] - **LoRA+:** Sets a higher learning rate for matrix B compared to matrix A in LoRA, resulting in better convergence [6]
X @Avi Chawla
Avi Chawla· 2025-12-04 06:30
I have been fine-tuning LLMs for over 2 years now!Here are the top 5 LLM fine-tuning techniques, explained with visuals:First of all, what's so different about LLM finetuning?Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).Since this kind of compute isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.Before we go into details of each technique, here's some background that will help you better understand these techniques:LLM weights are matric ...
开发者狂喜:Thinking Machines发布首款产品Tinker,后训练麻烦全给包了
机器之心· 2025-10-02 03:12
Core Insights - Tinker, the first product launched by Thinking Machines, is an API designed to simplify the fine-tuning of language models for developers and researchers, allowing them to focus on training data and algorithms while Tinker manages infrastructure-related tasks [2][4][16]. Product Features - Tinker supports various advanced models, including Qwen-235B-A22B, and allows users to switch from small to large models with ease, akin to changing a string in Python code [6][8]. - The API provides low-level primitives such as forward_backward and sample, which are essential for most common post-training methods. An open-source library, Tinker Cookbook, is also available to offer modern implementations of post-training methods [9][11]. Use Cases and Adoption - Teams from prestigious institutions like Princeton, Stanford, and UC Berkeley are already utilizing Tinker, demonstrating its versatility in supporting both supervised fine-tuning and experimental reinforcement learning pipelines [13]. - The Goedel team at Princeton achieved comparable performance to full-parameter models using only 20% of the data, while Stanford's chemistry group improved accuracy from 15% to 50% in a specific task using Tinker [14]. Market Position and Future Outlook - Tinker aims to democratize access to fine-tuning capabilities, potentially leading to more diverse product innovations in the AI space [16]. - The initial phase of Tinker will be free, with a usage-based pricing model to be introduced in the coming weeks [15].
用微信聊天记录来做AI数字的你,开源了
3 6 Ke· 2025-05-16 07:19
Core Insights - The WeClone project has gained significant attention as a solution for creating digital avatars based on WeChat chat records, utilizing large language models and fine-tuning techniques [1][2][3] - The project leverages RAG knowledge base principles to import WeChat chats and fine-tune models, enabling users to generate personalized digital personas [2][3] - The project is open-source and has garnered 8.7k stars on GitHub, indicating strong community interest and engagement [1] Project Overview - WeClone allows users to create digital avatars from their WeChat chat records, which are considered personal and detailed knowledge bases [3][7] - The project employs a default model, Qwen2.5-7B-Instruct, and utilizes LoRA for fine-tuning, requiring approximately 16GB of GPU memory [2] - The project includes features for automatic speech recognition (ASR) and text-to-speech (TTS), enabling the digital avatar to mimic the user's voice [2] Applications and Use Cases - The project can generate digital personas for various roles, including customer service representatives, marketing agents, and financial advisors, by utilizing chat records as knowledge bases [7] - Digital avatars can help reduce costs in customer service by automating responses based on accumulated chat data, thus eliminating the need for separate knowledge base management [7] - The ability to create tailored digital personas for different industries and roles enhances the effectiveness of communication and service delivery [7] Technical Implementation - Users can extract WeChat chat records using PyWxDump, with specific instructions for data migration and export in CSV format [6] - The project supports customization of dialogue names and system prompts, allowing users to personalize their digital avatars further [5] Community Engagement - The project encourages community participation by inviting users to join development groups for sharing product design cases and contributing to the development of digital personas [8]