不调参、不费力,上海交大&上海AI Lab推出“记忆解码器”,任意LLM无缝自适应
3 6 Ke·2025-08-26 09:17

Core Insights - The article discusses the challenges faced by large language models (LLMs) in specialized fields such as healthcare, finance, and law due to their lack of deep domain knowledge. It highlights the need for effective domain adaptation solutions to enhance LLM performance in these areas [2][4]. Group 1: Memory Decoder Innovation - The Memory Decoder is introduced as a "plug-and-play" pre-training memory module that can adapt various LLMs without modifying their parameters, enabling efficient domain adaptation across different model sizes [2][4]. - Experimental results show that the Memory Decoder effectively adapts Qwen and Llama models to biomedical, financial, and legal domains, achieving an average perplexity reduction of 6.17% [4][12]. Group 2: Architecture and Functionality - During the pre-training phase, the Memory Decoder learns to align its output distribution with that generated by a non-parametric retriever using a distribution alignment loss function [5][10]. - In the inference phase, it processes input data in parallel with the base language model, generating domain-enhanced predictions without additional retrieval costs [5][10]. Group 3: Performance Evaluation - The Memory Decoder demonstrates significant performance improvements across various GPT-2 model sizes on the WikiText-103 dataset, with a single Memory Decoder of 124 million parameters enhancing the performance of the entire GPT-2 series [11][12]. - In downstream tasks, the Memory Decoder maintains or improves performance across nine different NLP tasks, showcasing its ability to enhance domain adaptation while preserving general language capabilities [13][14]. Group 4: Cross-Model and Cross-Tokenizer Adaptation - The Memory Decoder exhibits strong cross-model adaptability, enhancing performance across different Qwen and Qwen2.5 models, regardless of their size [15][21]. - It also shows effective cross-tokenizer adaptation, allowing for the transfer of knowledge between different model architectures with minimal additional training [17][18]. Group 5: Knowledge-Intensive Reasoning Tasks - In knowledge-intensive reasoning tasks, the Memory Decoder outperforms traditional retrieval-augmented generation (RAG) methods, enhancing the model's ability to acquire factual knowledge while maintaining reasoning capabilities [19][20]. Group 6: Limitations and Future Directions - Despite its advantages, the Memory Decoder has limitations, such as the computational overhead associated with the KV data storage during the pre-training phase and the need for some parameter adjustments for cross-tokenizer adaptation [21][23].

不调参、不费力,上海交大&上海AI Lab推出“记忆解码器”,任意LLM无缝自适应 - Reportify