长上下文建模 - filings, earnings calls, financial reports, news

长上下文建模

Search documents

机器之心· 2026-01-02 01:55

人类已经走上了创造 AGI（通用人工智能）的道路，而其中一个关键方面是持续学习，即 AI 能通过与环境互动而不断学习新的知识和能力。想象一下你生命中的第一次机器学习讲座：你或许记不清教授开口说的第一个单词，但那场讲座留给你的直觉和逻辑，此刻正潜移默化地帮助你理解这篇复杂的论文。这种能力的本质在于压缩。近日，Astera 研究所、英伟达、斯坦福大学、加州大学伯克利分校、加州大学圣地亚哥分校的一个联合团队提出的 TTT-E2E（端到端测试时训练）沿着这条 AGI 的必经之路迈出了重要一步。它彻底打破了传统模型在推理时静态不变的局限，让长上下文建模从一种「架构设计」进化为一种「学习问题」。为此，研究社区已经在探索多种不同的道路，比如开发能够实时更新状态的循环神经网络（RNN），或者试图通过极大的缓存空间来容纳海量历史。然而，真正的 AGI 或许不应仅仅被动地「存储」信息，而应像人类一样在阅读中「进化」。该方法可以在测试阶段通过给定上下文的下一个 token 预测持续学习，将读取的上下文信息压缩至权重参数中。编辑｜Panda 论文标题：End-to-End Test-Time Training ...

用视觉压缩文本，清华、智谱推出Glyph框架：通过视觉-文本压缩扩展上下文窗口

3 6 Ke· 2025-10-21 23:10

Core Insights - Long-context modeling has emerged as a cutting-edge research trend in the large language model (LLM) industry, crucial for enhancing the productivity of LLMs [1] - The Glyph framework, developed by a research team from Tsinghua University and Z.ai, proposes a novel approach by rendering long texts as images, allowing for efficient processing through visual language models (VLMs) [1][3] Long Context LLMs - Long-context LLMs can achieve comprehensive semantic understanding and enhance multi-step reasoning and long-term memory capabilities, akin to human reading [1] - Traditional methods face limitations in practical applications due to increased computational and memory costs when extending context windows to millions of tokens [1] Glyph Framework - Glyph achieves 3-4 times token compression while maintaining accuracy comparable to leading models, significantly improving memory efficiency and training/inference speed [3][11] - For example, the classic novel "Jane Eyre" (approximately 240k text tokens) is rendered into a compact image (about 80k visual tokens), enabling a 128k context VLM to answer complex questions [3] Research Methodology - The Glyph framework consists of three main phases: continuous pre-training, LLM-driven rendering search, and post-training optimization [8][9][10] - Continuous pre-training involves rendering large-scale long text data into various visual styles to simulate real-world long text scenarios, enhancing cross-modal semantic alignment [8] - The LLM-driven rendering search optimizes rendering configurations to balance compression and understanding capabilities through a genetic search algorithm [9] - Post-training includes supervised fine-tuning and reinforcement learning to further enhance the model's text recognition and detail understanding abilities [10] Performance Evaluation - Glyph demonstrates competitive performance on multiple long-context benchmarks, achieving an average input compression rate of 3-4 times while maintaining accuracy similar to mainstream models [11][16] - In extreme compression scenarios, Glyph has the potential to handle million-token tasks using a 128k context length [17] Future Directions - The framework has limitations, such as sensitivity to rendering parameters and the need for improved OCR fidelity [21][22] - Future research may focus on adaptive rendering models, enhancing visual encoder capabilities, and expanding the evaluation scope to cover a wider range of tasks [23]

长上下文建模

视觉 - 文本压缩

Artificial Intelligence

Glyph框架

长上下文建模

视觉 - 文本压缩

Artificial Intelligence

Glyph框架

DeepSeek V4 借实习生获奖论文“起飞”？梁文峰剑指上下文：处理速度提10倍、要“完美”准确率

AI前线· 2025-07-31 05:02

Core Viewpoint - The article highlights the significant achievements of Chinese authors in the field of computational linguistics, particularly focusing on the award-winning paper from DeepSeek that introduces a novel sparse attention mechanism for long-context modeling, showcasing its efficiency and performance improvements over traditional methods [1][17]. Group 1: Award and Recognition - The ACL announced that over 51% of the award-winning papers for 2025 had Chinese authors, with the USA at 14% [1]. - A paper by DeepSeek, led by author Liang Wenfeng, won the Best Paper award, which has generated considerable discussion [1]. Group 2: Technical Innovations - The paper introduces a Natively Trainable Sparse Attention (NSA) mechanism, which combines algorithmic innovation with hardware optimization for efficient long-context modeling [4][6]. - NSA employs a dynamic hierarchical sparse strategy that balances global context awareness with local precision through token compression and selection [11]. Group 3: Performance Evaluation - NSA demonstrated superior performance in various benchmarks, outperforming traditional full attention models in 7 out of 9 metrics, particularly in long-context tasks [8][10]. - In a "needle in a haystack" test with 64k context, NSA achieved perfect retrieval accuracy and significant speed improvements in decoding and training processes [9][15]. Group 4: Future Implications - The upcoming DeepSeek model is expected to incorporate NSA technology, generating anticipation for its release [17]. - There are speculations regarding the delay of DeepSeek R2's release, attributed to the founder's dissatisfaction with its current performance [17].

稀疏注意力机制

长上下文建模

Artificial Intelligence

Artificial Intelligence

NSA（可原生训练的稀疏注意力机制）

DeepSeek V4

DeepSeek R2