Attention机制
Search documents
多模态大模型中Attention机制暗藏「骗局」,需用一个公式修正
3 6 Ke· 2026-01-27 08:15
Attention真的可靠吗? 近年来,Vision-Language Models(VLMs)在多模态理解任务中取得了显著进展,尤其是在视觉问答、图像理解和视频理 解等场景中,模型通常通过language-to-vision attention来衡量视觉token与文本之间的相关性,并据此进行visual token pruning,以降低推理成本、提升运行效率。 然而,一个长期被忽视的问题是:attention本身是否真的能够作为"语义重要性"的可靠指标? 在最新研究中,上海大学曾丹团队系统分析了主流VLM中attention的行为模式,发现一个关键却容易被忽略的现象—— attention并非只由语义决定,而是受到显著的结构性偏置影响。如果直接使用这些带偏置的attention进行visual token pruning,往往会在无意中保留不重要的视觉区域,同时丢失真正有助于任务理解的关键信息。 除了位置偏置之外,该团队还观察到另一类更隐蔽的问题:padding区域的attention异常偏高。在许多VLM中,由于输入图 像尺寸不一致,padding是不可避免的操作,但这些区域在语义上并不包含任何有用信 ...
多模态大模型中Attention机制暗藏「骗局」,需用一个公式修正丨上大×南开
量子位· 2026-01-27 02:33
Intcomp团队 投稿 量子位 | 公众号 QbitAI Attention真的可靠吗? 近年来,Vision-Language Models (VLMs) 在多模态理解任务中取得了显著进展,尤其是在视觉问答、图像理解和视频理解等场景中, 模型通常通过 language-to-vision attention 来衡量视觉token与文本之间的相关性,并据此进行visual token pruning,以降低推理成 本、提升运行效率。 然而,一个长期被忽视的问题是: attention本身是否真的能够作为"语义重要性"的可靠指标? 在最新研究中,上海大学曾丹团队系统分析了主流VLM中attention的行为模式,发现一个关键却容易被忽略的现象—— attention并非只由 语义决定,而是受到显著的结构性偏置影响 。如果直接使用这些带偏置的attention进行visual token pruning,往往会在无意中保留不重 要的视觉区域,同时丢失真正有助于任务理解的关键信息。 更为严重的是,当attention被用于visual token pruning时,这种位置偏置会被进一步放大,从而导致剪枝结果系统 ...
微软研究院杨玉庆:Agent 的注意力系统|Attention
3 6 Ke· 2025-09-05 03:42
Core Insights - The article discusses TriangleMix, a structural optimization method for attention mechanisms in large models, which addresses the computational bottleneck during the prefill stage while maintaining performance and accuracy [2][5][10] - TriangleMix allows for a hierarchical sparse attention architecture that significantly reduces latency and memory consumption, making it suitable for long-context tasks [8][10][36] Technical Overview - TriangleMix employs a layered attention strategy, using standard dense attention in the first 16 layers and switching to a triangle-shaped mask in the subsequent layers, which reduces computational complexity from O(N²) to O(N) [5][6] - The method has been tested on models like Llama-3.1-8B-Instruct, showing a kernel latency reduction from 750ms to 49ms, achieving a speedup of 15.3x and a decrease in time to first token (TTFT) by 12%-32% [10][9] Performance Metrics - Experimental results indicate that TriangleMix retains 99.7% of the original performance while applying the triangle attention in the majority of the deep layers [8][10] - The method demonstrates significant reductions in latency and memory usage with almost no loss in accuracy across various benchmark tasks [10][9] Broader Implications - The research emphasizes the importance of viewing attention mechanisms within the larger context of agent systems, training mechanisms, and task structures, rather than as isolated components [12][26] - The ongoing work at Microsoft Research focuses on optimizing agent-native systems, which aim to enhance the efficiency and effectiveness of AI applications, particularly for users with specific needs [15][67]