梯度下降

Search documents
揭秘LLM“思考”之谜:推理即“梯度下降”,元学习框架解构训练过程,还给优化提供新思路
量子位· 2025-06-10 04:05
Core Insights - The article introduces the Reasoning as Meta-Learning (RaML) framework, which aims to reveal how large language models (LLMs) "think" by drawing parallels between reasoning and gradient descent optimization [1][2] - RaML posits that the reasoning trajectory generated by LLMs during problem-solving acts as a form of implicit parameter updates, leading to improved model performance [2][4] Group 1: RaML Framework and Mechanism - RaML's core insight is that the reasoning trajectory in LLMs resembles a "pseudo-gradient descent" process, where each reasoning step adjusts the model's internal state towards a better solution [2] - The framework decomposes the training process of LLMs into two levels: "inner-loop optimization" for specific tasks and "outer-loop optimization" for learning strategies across multiple tasks [8][9] - The study emphasizes that longer reasoning trajectories typically lead to better optimization outcomes, akin to more iterations in traditional optimization algorithms [14] Group 2: Empirical Validation and Performance - The QwQ-32B model's reasoning on the AIME24 dataset demonstrated that confidence in correct answers increases with the decoding of reasoning trajectories, supporting the idea of parameter updates through reasoning [3][4] - The comparison between supervised fine-tuning (SFT) and reinforcement learning (RL) models showed that SFT models outperform RL models in mathematical benchmarks, highlighting the benefits of guided learning [10][12] Group 3: Reflection Tokens and Optimization - The article discusses the role of "reflection" tokens in reasoning trajectories, which help the model reassess its outputs and improve performance by escaping local optima [15][17] - It contrasts "thinking" and "non-thinking" modes, indicating that forced early termination of reasoning can lead to suboptimal solutions, similar to premature stopping in gradient descent [18][20] Group 4: Generalization and Meta-Learning - The research indicates that LLMs trained on specific reasoning tasks can generalize to unseen tasks, leveraging learned universal features from various problems [21][23] - The RaML framework provides practical strategies for enhancing training performance by increasing the number of reasoning trajectories for each problem, akin to expanding the support set in meta-learning [25] Group 5: Future Directions and Efficiency - The article suggests exploring methods to extract shorter, equivalent optimization trajectories from longer reasoning paths to reduce decoding overhead while maintaining performance [27][30] - Initial experiments show that summarizing long reasoning trajectories can yield comparable results with significantly reduced computational costs, indicating a potential area for future research [30][31] Conclusion - The RaML framework offers a novel perspective on understanding LLM reasoning and training, revealing the intricate connections between reasoning, meta-learning, and gradient descent [32]
【广发金工】AlphaForge:基于梯度下降的因子挖掘
广发金融工程研究· 2025-04-30 02:21
广发证券首席金工分析师 安宁宁 SAC: S0260512020003 anningning@gf.com.cn 广发证券资深金工分析师 陈原文 SAC: S0260517080003 chenyuanwen@gf.com.cn 广发证券资深金工分析师 王小康 SAC: S0260525020002 wangxiaokang@gf.com.cn 广发金工安宁宁陈原文团队 摘要 公式化因子挖掘与AlphaForge框架介绍 : 各类神经网络模型作为一种编码方案,能较好预测未来一段时间股票截面收益率的差异。而构造更多的公式化 特征作为模型输入也是比较重要的环节,从理论上具有丰富模型输入,代替一部分编码器职能的效果。传统的框架包括遗传规划、OpenFE等,均无法实 现具有方向性的优化迭代,而AlphaGen虽然通过将因子表现作为奖励不断优化生成动作,但依然存在超参数敏感、容易过拟合的情况。AlphaForge通过创 新性的框架设计,一定程度上解决了上述问题。 基于AlphaForge的因子挖掘 : 该框架首先通过设计若干算子、回看天数和基础特征得到潜在的因子库。因子会依次经过生成器和预测器完成训 练,其中预测器的主 ...