Multi-token prediction

Search documents
突破单token预测局限!南洋理工首次将多token预测引入微调,编程任务准确率提升11.67%
量子位· 2025-07-24 07:28
Core Viewpoint - The article discusses a new technology called Concept-Aware Fine-Tuning (CAFT) developed by Nanyang Technological University, which introduces multi-token prediction into the fine-tuning phase of large language models (LLMs), allowing them to understand and learn complete concepts like humans do, rather than just fragmented tokens [1][4]. Group 1: Next-Token Prediction Limitations - Traditional LLMs rely on next-token prediction, which breaks down complete concepts into fragments, hindering the model's ability to form holistic understanding [10][12]. - The next-token prediction process involves tokenization, sequence modeling, and probability prediction, but it only predicts one token at a time, leading to inefficiencies in learning complex concepts [6][9]. Group 2: Introduction of CAFT - CAFT adds auxiliary heads during the fine-tuning phase to help the model learn subsequent tokens while optimizing for the primary task, thus enhancing multi-token concept learning without increasing costs [2][14]. - The architecture of CAFT includes auxiliary heads and a specially designed loss function that prioritizes the main task while allowing for multi-token learning [14][20]. Group 3: Performance Improvements - CAFT has shown significant performance improvements across various fields, including programming, mathematics, and biomedical applications, indicating a potential paradigm shift in AI training methodologies [4][22]. - In programming tasks, CAFT improved accuracy from 40.9% to 45.1% for LoRA fine-tuning and from 40.5% to 49.3% for full fine-tuning [26]. - In mathematical reasoning, CAFT achieved a performance increase of 1.7% on the MATH-500 dataset, demonstrating its effectiveness in complex reasoning tasks [29]. Group 4: Validation Across Domains - CAFT was tested in clinical text analysis, where it outperformed traditional methods in capturing long text concepts, with ROUGE-1 scores improving from 44.57 to 45.93 [30]. - In chemical structure understanding, CAFT significantly improved the accurate matching rate from 0.14% to 0.54%, showcasing its ability to learn multi-token concepts effectively [32]. - The technology also demonstrated its generalization capabilities by generating protein sequences, with sequence identity improving from 20.32% to 22.14% [35]. Group 5: Conclusion and Future Implications - The research validates the feasibility of implementing multi-token prediction in the fine-tuning phase, highlighting CAFT's ease of use and low cost, which may position it as a viable alternative to traditional next-token prediction methods [37].