分层推理模型(Hierarchical Reasoning Model)

Search documents
400万人围观的分层推理模型,「分层架构」竟不起作用?性能提升另有隐情?
机器之心· 2025-08-17 04:28
Core Insights - The article discusses the Hierarchical Reasoning Model (HRM), which has gained significant attention since its release in June, achieving a score of 41% on the ARC-AGI-1 benchmark with a relatively small model of 27 million parameters [3][4][5]. Group 1: HRM Performance and Analysis - HRM's performance on the ARC-AGI benchmark is impressive given its model size, with a score of 32% on the semi-private dataset, indicating minimal overfitting [29]. - The analysis revealed that the hierarchical architecture's impact on performance is minimal compared to the significant performance boost from the less emphasized "outer loop" optimization process during training [5][41]. - Cross-task transfer learning benefits were found to be limited, with most performance derived from memorizing specific task solutions used during evaluation [6][52]. Group 2: Key Components of HRM - Pre-training task augmentation is crucial, with only 300 augmentations needed to achieve near-maximum performance, contrary to the 1000 augmentations reported in the original paper [7][56]. - The HRM architecture combines slow planning (H-level) and fast execution (L-level), but the performance gains are not solely attributed to this structure [35][40]. - The outer loop optimization process significantly enhances performance, with a notable increase in accuracy observed with iterative optimization during training [41][46]. Group 3: Future Directions and Community Engagement - The article encourages further exploration of various aspects of HRM, including the impact of puzzle_id embeddings on model performance and the potential for generalization beyond training data [62][63]. - The analysis emphasizes the importance of community-driven evaluations of research, suggesting that such detailed scrutiny can lead to more efficient knowledge acquisition [65][66].