Workflow
递归推理
icon
Search documents
MIT新论文:2026推理模型过时了,“套娃模型”当立
3 6 Ke· 2026-01-04 10:09
Core Insights - The article discusses the emergence of a new paradigm called the "Nested Model" or Recursive Language Model (RLM), which is predicted to become mainstream this year [2][3]. Group 1: Model Overview - The RLM redefines how long texts are processed by storing text in a code environment and allowing the model to write programs that recursively call itself for processing [3][8]. - This model significantly reduces the "context decay" phenomenon when handling long texts and operates at a lower cost compared to traditional models [1][22]. Group 2: Technical Mechanism - RLM utilizes an external Python REPL environment to manage long texts as static string variables, decoupling the input data length from the model's context window size [8][10]. - The model employs a cognitive loop based on code, where it observes the environment, writes Python code to probe the text, and processes results iteratively [10][15]. Group 3: Performance Metrics - RLM has demonstrated the ability to handle up to 10 million tokens, surpassing the context window of models like GPT-5 by two orders of magnitude [16]. - In various benchmark tests, RLM outperformed traditional models in tasks requiring high-density information processing, achieving F1 scores of 58.00% and 23.11% in complex tasks, while traditional models scored below 0.1% [18][19]. Group 4: Cost Efficiency - The RLM's approach allows for selective reading of relevant text segments, leading to a significant reduction in operational costs compared to full-context models [20][22]. - For instance, in the BrowseComp-Plus benchmark, the average cost for RLM was only $0.99, compared to $1.50 to $2.75 for GPT-5-mini processing similar token inputs [20][22].
MIT新论文:2026推理模型过时了,“套娃模型”当立
量子位· 2026-01-04 09:06
Core Viewpoint - The article discusses the emergence of a new paradigm in language models called the "Recursive Language Model" (RLM), which significantly improves the handling of long texts and reduces costs compared to traditional models like GPT-5 [3][5][23]. Group 1: RLM Overview - The RLM introduces a novel approach by storing text in a code environment and allowing the model to write programs that recursively call itself to process the text [5][9]. - This method decouples the length of input data from the model's context window size, enabling the processing of text limited only by physical memory rather than the constraints of the Transformer architecture [10][12]. Group 2: Performance Metrics - RLM has demonstrated the ability to effectively handle up to 10 million tokens, surpassing the context window of leading models like GPT-5 by two orders of magnitude [23]. - In various benchmark tests, RLM outperformed traditional models in complex tasks, achieving F1 scores of 58.00% and 23.11% in OOLONG and OOLONG-Pairs tests, respectively, while traditional models scored below 0.1% [27]. Group 3: Cost Efficiency - RLM's approach allows for selective reading of relevant text segments, leading to a significant reduction in operational costs. For instance, the average cost for RLM in the BrowseComp-Plus benchmark was only $0.99, compared to $1.50 to $2.75 for GPT-5 [29][31]. - This cost efficiency indicates that RLM can maintain performance while controlling inference costs, making it a viable option for large-scale applications involving long texts [32].