Workflow
递归语言模型(Recursive Language Models
icon
Search documents
递归语言模型登场!MIT华人新作爆火,扩展模型上下文便宜又简单
机器之心· 2025-10-16 07:34
Core Insights - The article discusses the limitations of current mainstream large language models (LLMs) regarding context length and performance degradation, known as "context rot" [2][26]. - Researchers from MIT propose a new approach called Recursive Language Models (RLMs) to address these issues by breaking down long contexts into manageable parts and processing them recursively [4][6]. Group 1: RLM Concept and Implementation - RLMs treat input context as a variable, allowing the main model to decompose and interact recursively with the context [8][14]. - In practical implementation, RLMs utilize a Python REPL environment to store user prompts in variables and process them iteratively, leading to significant performance improvements [5][17]. - The RLM framework enables the root language model to manage context more flexibly, avoiding the pitfalls of traditional models that read the entire context at once [23][16]. Group 2: Performance Results - In tests on the OOLONG benchmark, RLM using GPT-5-mini achieved over 114% improvement in correct answers compared to GPT-5, with lower average costs per query [28][30]. - RLM demonstrated no performance degradation even when processing contexts exceeding 10 million tokens, outperforming traditional methods like ReAct + retrieval [34][35]. - The RLM framework allows for a more efficient handling of large contexts, maintaining performance without additional fine-tuning or structural changes [35][39]. Group 3: Future Implications - The researchers believe RLMs could become a powerful paradigm for reasoning and context management in LLMs, potentially revolutionizing how models handle extensive data [6][7]. - As LLM capabilities improve, RLMs are expected to scale effectively, potentially managing even larger contexts in the future [37][40]. - The approach emphasizes that language models should autonomously determine how to decompose and process tasks, contrasting with traditional agent-based methods [40][41].