Workflow
大模型长文本处理
icon
Search documents
32k微调处理百万Token:21倍的推理加速,10倍的峰值显存节省,实现恒定内存消耗
量子位· 2026-02-13 13:19
CoMeT团队 投稿 量子位 | 公众号 QbitAI 当大模型试图处理一段包含100万token的超长文档时,会发生什么?答案是: 内存爆炸,计算崩溃 。 无论是分析整个代码库、处理万字研报,还是进行超长多轮对话,LLM的"长文本能力"都是其走向更高阶智能的关键。然而,Transformer架 构的固有瓶颈── 与上下文长度成平方关系的计算复杂度和线性增长的KV Cache ,使其在面对超长序列时力不从心,变成了一个既"算不 动"也"存不下"的"吞金巨兽"。 为了"续命",现有方案要么选择上下文压缩,但这本质上是有损的,信息丢失不可避免;要么采用循环机制,但这类模型又常常"健忘",难以 保留贯穿全文的关键信息,也记不清刚刚发生的细节。 △ CoMeT在32k上下文训练后,可在1M token中精准大海捞针,且推理速度和内存占用远优于全注意力模型 鱼与熊掌兼得:"协同记忆"架构 CoMeT的巧妙之处在于,它没有试图用单一机制解决所有问题,而是设计了一套双轨并行的协同记忆系统,让模型既能"记得牢",又能"看得 清"。 1. 全局记忆(Global Memory):一个带"门禁"的记忆保险箱 为了解决长期遗忘问题 ...
1万tokens是检验长文本的新基准,超过后18款大模型集体失智
量子位· 2025-07-17 02:43
Core Insights - The article discusses the performance decline of large language models (LLMs) as the input context length increases, highlighting that the decline is not uniform but occurs at specific token lengths [10][21][44] - A recent study by the Chroma team tested 18 mainstream LLMs, revealing that models like GPT-4.1 and Claude Sonnet 4 experience significant accuracy drops when processing longer inputs [8][9][19] Group 1: Performance Decline - As input length increases, model performance deteriorates, with a notable drop around 10,000 tokens, where accuracy can fall to approximately 50% [4][21] - Different models exhibit varying thresholds for performance decline, with some models losing accuracy earlier than others [6][7][19] - The study indicates that semantic similarity between the "needle" (target information) and the "problem" significantly affects performance, with lower similarity leading to greater declines [19][21] Group 2: Experimental Findings - Four controlled experiments were conducted to assess the impact of input length on model performance, focusing on factors like semantic similarity, interference information, and text structure [17][35][41] - The first experiment showed that as input length increased, models struggled more with low semantic similarity, leading to a sharper performance drop [19][21] - The second experiment demonstrated that the presence of interference items significantly reduced model accuracy, with multiple interference items causing a 30%-50% drop compared to baseline performance [26][28] Group 3: Structural Impact - The structure of the background text (haystack) also plays a crucial role in model performance, with coherent structures leading to more significant declines in accuracy compared to disordered structures [40][42] - The experiments revealed that most models performed worse with coherent structures as input length increased, while performance decline was less severe with disordered structures [41][44] - The findings suggest that LLMs face challenges in processing complex logical structures in long texts, indicating a need for improved handling of such inputs [41][44] Group 4: Implications and Future Directions - The results highlight the limitations of current LLMs in managing long-context tasks, prompting suggestions for clearer instructions and context management strategies [44] - Chroma, the team behind the research, aims to address these challenges by developing open-source tools to enhance LLM applications in processing long texts [45][48]