Core Viewpoint - Nvidia has introduced the TTT-E2E method in collaboration with various research institutions to enhance memory capabilities in large models, significantly improving processing speed and efficiency for long texts [1][2]. Group 1: TTT-E2E Method Overview - TTT-E2E processes 128K long texts 2.7 times faster than full attention models and achieves a 35-fold speedup when handling 2M contexts, without compromising performance [3]. - Unlike the recently popular DeepSeek memory module, TTT-E2E employs dynamic learning through context compression rather than static learning paths [5][6]. - The method allows real-time learning, compressing key content into model weights, enabling the model to maintain a learning state during testing [7][8]. Group 2: Technical Implementation - TTT-E2E is based on a standard Transformer with sliding window attention, making it easy to deploy without relying on complex architectures [11]. - The core idea shifts long text modeling from an architectural design issue to a "continuous learning" task [12]. - During testing, the model predicts the next word based on the current context, updating its parameters through gradient descent to dynamically compress information into its weights [13]. Group 3: Training and Optimization - The training phase utilizes meta-learning to prepare the model for "test-time learning," simulating each training sequence as a test sequence [14]. - TTT-E2E incorporates three key optimizations: a combination of mini-batch processing with sliding windows, precise update strategies focusing on specific layers, and a dual MLP design to balance new context absorption with pre-trained knowledge [16][17]. Group 4: Performance and Limitations - Experimental data shows TTT-E2E performs comparably or better than full attention Transformers in terms of test loss, while maintaining consistent inference latency regardless of context length [19][23]. - In tasks requiring precise detail recall, TTT-E2E's performance is inferior to full attention models due to its memory compression approach, which filters out seemingly irrelevant details [25][26]. - The meta-learning process in the training phase is currently slower than standard pre-training methods [27]. Group 5: Research and Development - The project is led by Yu Sun, a postdoctoral researcher at Stanford, who aims to enable AI systems to learn continuously like humans [29][30]. - The code and related papers for TTT-E2E have been fully open-sourced [28].
不用额外缓存!英伟达开源大模型记忆压缩方案,128K上下文提速2.7倍