MIT新论文：2026推理模型过时了，“套娃模型”当立

Core Insights - The article discusses the emergence of a new paradigm called the "Nested Model" or Recursive Language Model (RLM), which is predicted to become mainstream this year [2][3]. Group 1: Model Overview - The RLM redefines how long texts are processed by storing text in a code environment and allowing the model to write programs that recursively call itself for processing [3][8]. - This model significantly reduces the "context decay" phenomenon when handling long texts and operates at a lower cost compared to traditional models [1][22]. Group 2: Technical Mechanism - RLM utilizes an external Python REPL environment to manage long texts as static string variables, decoupling the input data length from the model's context window size [8][10]. - The model employs a cognitive loop based on code, where it observes the environment, writes Python code to probe the text, and processes results iteratively [10][15]. Group 3: Performance Metrics - RLM has demonstrated the ability to handle up to 10 million tokens, surpassing the context window of models like GPT-5 by two orders of magnitude [16]. - In various benchmark tests, RLM outperformed traditional models in tasks requiring high-density information processing, achieving F1 scores of 58.00% and 23.11% in complex tasks, while traditional models scored below 0.1% [18][19]. Group 4: Cost Efficiency - The RLM's approach allows for selective reading of relevant text segments, leading to a significant reduction in operational costs compared to full-context models [20][22]. - For instance, in the BrowseComp-Plus benchmark, the average cost for RLM was only $0.99, compared to $1.50 to $2.75 for GPT-5-mini processing similar token inputs [20][22].