三星 TRM 论文：少即是多，用递归替代深度，挑战 Transformer 范式

Core Insights - The study reveals that smaller networks can outperform large language models in reasoning tasks, with the Tiny Recursive Model (TRM) achieving superior results using only 7 million parameters and a two-layer neural network [4][11] - TRM's architecture eliminates self-attention layers, demonstrating that for small-scale fixed input tasks, a multi-layer perceptron (MLP) can reduce overfitting [4][11] - The model's recursive updating mechanism allows for multi-round self-correction, leading to higher accuracy compared to traditional large models that rely on chain-of-thought reasoning [6][12] Model Performance - TRM achieved an accuracy of 45% on ARC-AGI-1 and 8% on ARC-AGI-2, outperforming most large models [6] - In the Sudoku-Extreme task, TRM set a record with an accuracy of 87.4%, while in the Maze-Hard task, it reached 85.3% accuracy [6][14] - The model's performance is significantly better than the Hierarchical Reasoning Model (HRM), with TRM's accuracy in Maze-Hard being 10 percentage points higher [8][12] Design Philosophy - TRM's design is a reflection on the previous HRM model, simplifying the recursive updating process by using a single network with deep supervision [8][12] - The model reduces parameters by approximately 74% compared to HRM and halves the number of forward passes while improving accuracy [8][12] - TRM replaces the traditional adaptive computational time mechanism with a simple binary decision to stop reasoning, enhancing training speed without sacrificing accuracy [9][10] Implications for AI Development - The findings challenge the notion that larger models are inherently better, suggesting that depth of recursive thinking can replace the need for increased model size [11][14] - The research indicates a new direction for lightweight AI reasoning, particularly beneficial for edge AI and low-resource applications [14] - The study emphasizes that intelligent depth may stem from repeated thinking rather than sheer scale, providing insights for future AI model development [14]