测试时训练
Search documents
首个测试时共进化合成框架TTCS:在「左右互搏」中突破推理瓶颈
机器之心· 2026-02-10 08:52
Core Insights - The article discusses the emergence of the Test-Time Curriculum Synthesis (TTCS) framework, which addresses challenges in Test-Time Training (TTT) by generating curriculum data that aligns with the model's capability frontier, thus enhancing performance on difficult test problems [2][10][30] Group 1: Motivation and Background - The shift in focus from merely expanding parameters in large language models (LLMs) to leveraging Test-Time Scaling for effective training is highlighted as a core motivation [5] - The existing TTT methods struggle with high-difficulty test questions due to noisy pseudo-labels, leading to ineffective learning [2][7] Group 2: Methodology - TTCS operates through a co-evolutionary framework involving two agents: the Synthesizer, which generates questions at the model's capability frontier, and the Solver, which attempts to solve these questions [11][14] - A capability-adaptive reward mechanism is implemented to ensure that the generated questions are neither too easy nor too difficult, facilitating a dynamic learning environment [16] Group 3: Experimental Results - TTCS demonstrated significant improvements in mathematical reasoning scores, with Qwen2.5-Math-1.5B achieving an average score of 41.49, up from 17.30, marking an increase of +24.19 [3][20] - In challenging AIME competition problems, TTCS outperformed strong baselines like TTRL, showcasing its effectiveness in tackling high-difficulty questions [22][23] Group 4: Broader Implications - The framework not only enhances performance in mathematics but also shows generalization capabilities across various reasoning tasks, indicating that the model learns universal reasoning logic rather than overfitting [22] - The findings suggest that adaptive teaching methods (dynamic Synthesizer) are more effective than static high-level models, emphasizing the importance of tailored learning experiences [25][26] Group 5: Conclusion and Future Outlook - TTCS represents a reconstruction of the Test-Time Computing paradigm, positioning models as active curriculum designers rather than passive problem solvers [30] - The framework addresses critical issues of data scarcity and difficulty gaps in test-time training, paving the way for future self-evolving agents capable of continuous evolution in unknown environments [30]
不用额外缓存,英伟达开源大模型记忆压缩方案,128K上下文提速2.7倍
3 6 Ke· 2026-01-14 08:22
提高大模型记忆这块儿,美国大模型开源王者——英伟达也出招了。 联合Astera研究所、斯坦福大学、UC伯克利、加州大学圣地亚哥分校等机构推出了TTT-E2E方法。 在128K超长文本上处理速度比全注意力模型快2.7倍,处理2M上下文时提速达35倍,性能还不打折。 这项技术与前几天大火的DeepSeek条件记忆模块有所不同。 DeepSeek的Engram模块依赖的是"按需查表"的静态学习路径,而英伟达走的是动态学习的路子,关键在于上下文压缩。 通过实时学习将关键内容压缩到自身权重中,让模型在测试阶段依然保持学习状态。 这样既避免了额外缓存的负担,又能精准捕捉长文本中的核心逻辑。 给模型装上记忆压缩包 TTT-E2E并没有依赖复杂特殊架构,反而是基于带滑动窗口注意力的标准Transformer,容易部署。 这个方法的核心思路是将长文本建模从架构设计问题转化为「持续学习」任务。 在测试阶段,模型会基于当前读取的上下文进行下一个词预测。 为了平衡效率与稳定性,TTT-E2E还设计了三项关键优化。 一是采用「迷你批处理+滑动窗口」的组合策略。将测试时的训练数据分成多个迷你批,配合8K大小的滑动窗口注意力,既解决了单t ...
不用额外缓存!英伟达开源大模型记忆压缩方案,128K上下文提速2.7倍
量子位· 2026-01-14 04:42
这项技术与前几天大火的DeepSeek条件记忆模块有所不同。 DeepSeek的Engram模块依赖的是"按需查表"的静态学习路径,而英伟达走的是动态学习的路子,关键在于 上下文压缩 。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 提高大模型记忆这块儿,美国大模型开源王者——英伟达也出招了。 联合Astera研究所、斯坦福大学、UC伯克利、加州大学圣地亚哥分校等机构推出了 TTT-E2E 方法。 在128K超长文本上处理速度比全注意力模型快2.7倍,处理2M上下文时提速达35倍,性能还不打折。 通过实时学习将关键内容压缩到自身权重中,让模型在测试阶段依然保持学习状态。 这样既避免了额外缓存的负担,又能精准捕捉长文本中的核心逻辑。 把每个训练序列都模拟成测试序列,先在 内循环 中对其进行测试时训练,再在 外循环 中优化模型的初始参数,确保初始状态就能快速适配 测试时的学习需求,实现了训练与测试的端到端对齐优化。 为了平衡效率与稳定性,TTT-E2E还设计了三项关键优化。 一是采用「迷你批处理+滑动窗口」的组合策略。将测试时的训练数据分成多个迷你批,配合8K大小的滑动窗口注意力,既解决了单token梯 ...