Workflow
分层推理模型(HRM)
icon
Search documents
700万参数击败DeepSeek R1等,三星一人独作爆火,用递归颠覆大模型推理
机器之心· 2025-10-09 04:43
机器之心报道 对 HRM 感兴趣的读者可以参考 我们之前的报道 。 编辑:冷猫 Training Small, Thinking Big. 大模型的推理架构颠覆的未免有些太快了。 今年 6 月,来自 Sapient Intelligence 的研究者提出了分层推理模型(HRM),用循环架构打破了传统思维链(CoT)的架构限制,对大模型推理结构产生了重大的 影响。 HRM 仅包含 2700 万个参数(大约比最小的 Qwen3 0.6B 模型小 22 倍) ,仅使用 1000 个训练样本,便在复杂的推理任务上取得了卓越的性能。 仅仅过了四个月,HRM 的架构就彻底不够看了。 来自加拿大蒙特利尔三星先进技术研究所(SAIT)的高级 AI 研究员 Alexia Jolicoeur-Martineau 介绍了微型递归模型(TRM)。 这个 TRM 有多离谱呢?一个 仅 包含 700 万个参数 (比 HRM 还要小 4 倍)的网络 ,在某些最困难的推理基准测试中,其参数数量与 o3-mini 和 Gemini 2.5 Pro 等 尖端语言模型相比,甚至可以超越它们,尽管这些模型的参数数量是 TRM 的 10,000 倍。 ...
只用2700万参数,这个推理模型超越了DeepSeek和Claude
机器之心· 2025-06-30 10:23
Core Insights - The article discusses the need for transformation in the architecture of large language models (LLMs), particularly focusing on the limitations of current chain-of-thought (CoT) techniques, which face challenges such as task complexity, high data requirements, and latency issues [2][4]. Group 1: Hierarchical Reasoning Model (HRM) - The Hierarchical Reasoning Model (HRM) is introduced as a novel cyclic architecture inspired by the human brain's layered and multi-timescale processing mechanisms, achieving high computational depth while maintaining training stability and efficiency [3][6]. - HRM operates through two interdependent cyclic modules: a high-level module for slow, abstract planning and a low-level module for fast, detailed computations, achieving remarkable performance on complex reasoning tasks with only 27 million parameters and 1,000 training samples [4][5]. - HRM does not require pre-training or CoT data, yet it performs nearly perfectly on challenging tasks such as complex Sudoku puzzles and optimal pathfinding in large mazes, outperforming larger models with longer context windows [5][6]. Group 2: Design and Mechanisms - The core design of HRM is based on hierarchical processing and time-scale separation, where high-level brain regions integrate information over longer time scales while low-level regions handle immediate sensory information [12][13]. - HRM incorporates feedback loops similar to the brain's dense recurrent neural network connections, enhancing representation accuracy and contextual adaptability while avoiding issues related to backpropagation through time (BPTT) [14][19]. - The model introduces approximate gradients and deep supervision, allowing for efficient memory usage and improved training dynamics, which contrasts with traditional methods that require extensive memory and time [20][23]. Group 3: Performance and Adaptability - HRM demonstrates hierarchical convergence, with the high-level module stabilizing while the low-level module converges repeatedly, leading to rapid convergence and minimal residuals compared to deep neural networks [17][36]. - The model features adaptive computation time (ACT), enabling it to dynamically adjust computational resources based on task complexity, thus optimizing performance without significant resource expenditure [25][27]. - HRM can seamlessly extend inference computation by adjusting parameters without the need for retraining or architectural changes, showcasing its flexibility in handling complex reasoning tasks [28][36]. Group 4: Experimental Results - Experimental results indicate that HRM excels in complex reasoning tasks, raising questions about the underlying reasoning algorithms it employs, which is crucial for enhancing model interpretability [31][39]. - Visualizations of HRM's reasoning processes reveal its strategies in maze and Sudoku tasks, demonstrating a combination of exploration and optimization techniques that resemble depth-first search methods [31][38]. - The hierarchical structure of HRM emerges as a natural characteristic during the learning of complex reasoning tasks, rather than being an inherent property of the model architecture [34].