Core Insights - Meta's Superintelligence Lab has introduced a new decoding framework called REFRAG, which redefines Retrieval-Augmented Generation (RAG) and can accelerate Time-to-First-Token (TTFT) by up to 30 times [1][24]. Group 1: RAG Overview - RAG enhances large language models (LLMs) by retrieving relevant information from external knowledge bases to improve the accuracy and timeliness of responses [6]. - The current RAG model faces challenges in balancing reasoning efficiency and information volume, leading to increased computational complexity and delays in generating responses [7][8]. Group 2: REFRAG Framework - REFRAG optimizes the way LLMs process external knowledge through a three-step process: Compress, Sense, and Expand [14]. - The compression step involves using a lightweight encoder to convert long reference texts into compact vector representations, significantly reducing input sequence length and computational load [17]. - The sensing step employs a reinforcement learning-based strategy network to identify and retain key information from the compressed representations [20][21]. - The expansion step combines compressed representations with essential original text blocks to provide LLMs with optimized input for generating responses [23]. Group 3: Performance Improvements - REFRAG has demonstrated a maximum acceleration of 30.85 times in TTFT and a 3.75 times improvement compared to previous advanced methods [24]. - The framework maintains performance accuracy in perplexity and various downstream tasks, such as question answering and summarization, without any loss in performance [27]. - The compression technique allows the model to handle more information within the same computational budget, effectively expanding the context window by 16 times, which can enhance performance in certain tasks [28]. - REFRAG is applicable not only to RAG but also to multi-turn dialogues and long document summarization tasks, addressing core efficiency issues in processing long-context information [29].
Meta超级智能实验室首篇论文:重新定义RAG
量子位·2025-09-08 07:00