大型语言模型

Search documents
ICML 2025 | 注意力机制中的极大值:破解大语言模型上下文理解的关键
机器之心· 2025-05-06 04:11
Core Insights - The article discusses a significant phenomenon in large language models (LLMs) related to the concentration of massive values in the self-attention mechanism, particularly in the query (Q) and key (K) representations, which is crucial for contextual knowledge understanding [1][3][4]. Research Highlights - The study reveals that massive values are highly concentrated in Q and K, which is contrary to the expectation of independent operations in each attention head. This consistency across multiple layers and heads is visually demonstrated [3][4]. - The phenomenon of massive values is specifically observed in models using Rotational Position Encoding (RoPE), such as LLaMA, Qwen, and Gemma, while models without RoPE, like GPT-2 and OPT, do not exhibit this pattern [4]. - The research establishes a direct link between the presence of massive values in Q and K and the ability to understand contextual knowledge [4]. Key Findings 1. **Concentration of Massive Values**: Massive values are found to be highly concentrated in specific regions of each attention head, indicating a surprising level of consistency [3][4]. 2. **Impact on Contextual Knowledge Understanding**: The study shows that the presence of massive values is critical for understanding contextual knowledge, as demonstrated through destructive experiments that reset these values to their average [5][6]. 3. **Quantization Techniques**: Specific quantization methods that address massive values, such as AWQ and SmoothQuant, are shown to better preserve contextual knowledge understanding compared to methods that do not focus on massive values [7]. 4. **Origin of Concentration Phenomenon**: The concentration of massive values is attributed to RoPE, which affects low-frequency regions in Q and K, leading to this phenomenon appearing from the early layers of the model [8]. Experimental Results - The experiments reveal a stark contrast in the impact of massive values on different knowledge tasks: - **Resilience in Parametric Knowledge Retrieval**: Tasks relying on parametric knowledge show a decline of only 15-20% in accuracy when massive values are disrupted, maintaining 76%-88% accuracy [10]. - **Catastrophic Decline in Contextual Knowledge Tasks**: Tasks requiring contextual understanding experience a drastic drop in performance, with accuracy in key retrieval tasks plummeting from 100% to near 0% when massive values are disrupted [11]. - **Control Experiments**: When only non-massive values are disrupted, task performance remains stable, confirming the unique importance of massive values in contextual understanding [12]. Future Directions - The research opens several avenues for further exploration, including enhancing or adjusting the distribution of massive values to improve contextual understanding, examining the universality of this phenomenon across different architectures, and designing targeted quantization methods to protect massive values related to contextual understanding [16].
过去四周,AI推理爆了,GPU在燃烧,英伟达依旧供不应求
硬AI· 2025-04-29 00:18
根据摩根士丹利Joseph Moore团队25日发布的报告, 这种强劲的需求主要驱动因素在于token生成量的 增长,自年初以来,token生成量增长了5倍以上 ,这给生态系统带来了巨大压力,并推动了对处理这些 工作负载的投资激增。 点击 上方 硬AI 关注我们 大摩指出,受益于大型语言模型对推理芯片的巨大需求,英伟达面临GPU供不应求局面。但在持续的供应限制、毛利率 压力等负面影响下,大摩轻微下调英伟达目标价至160美元。长期来看,公司增长轨迹依然强劲。 硬·AI 作者 | 张雅琦 编辑 | 硬 AI 过去四周,投资者情绪因宏观经济和供应链风险而恶化,但与此同时,对英伟达GPU核心的需求却因主要 大型语言模型(LLM)对推理芯片的巨大需求而飙升,且这种需求遍及所有地区。 多家AI公司报告用户数量呈爆炸式增长,例如,Open Router等API公司的数据显示,许多公司为满足推 理软件的巨量需求,被迫争抢GPU资源,甚至出现"最后一块GB200"在2025年仅剩一块的状况。 摩根士丹利认为, 这种对推理的需求是关键。 这是由使用模型并产生收入的部分驱动的,证明了推理模 型的扩展是真实存在的,这与仅依赖于风险投 ...
机构:2027年HBM4将用于自动驾驶
半导体芯闻· 2025-03-07 10:20
Core Insights - The article emphasizes the critical role of memory solutions in driving the development of Generative AI (GenAI), highlighting the need for innovation in semiconductor technology [2][4] - It discusses the challenges faced by DRAM solutions, including cost and time to market, and suggests that manufacturers must adopt cost-reduction strategies while customers should commit to procurement [2][4] Group 1: Memory Solutions and Innovations - Counterpoint Research identifies that short-term Processing-In-Memory (PIM) is the most innovative memory solution, primarily supporting Neural Processing Units (NPU), but is limited to a few applications [2] - The article predicts that by 2026, Apple will transition from Package-on-Package (PoP) architecture to standalone DRAM configurations in iPhone Pro Max and foldable models to enhance bandwidth [2] - High-performance application processors (AP) and LPDDR usage are expected to increase with the advancement of autonomous driving technology, with HBM4 anticipated to be introduced in autonomous driving systems after 2027 [2] Group 2: Technological Developments and Challenges - NVIDIA's DIGITS technology aims to enhance memory bandwidth through the integration of GPU and HBM, with plans to improve CPU bandwidth by mid-2025 using SOCAMM technology [3] - The article notes that PCB and connector costs remain a significant challenge, with no immediate plans to apply this technology to the general PC market [3] - Samsung emphasizes the need for a balance between high bandwidth, speed, capacity, low latency, and power management in generative AI memory solutions [3] Group 3: Future Trends and Industry Dynamics - The article forecasts that by 2030, HBM5 will reach 20 stacked layers and integrate more logic devices into a single chiplet architecture, increasing the importance of TSMC's role in CoWoS technology [3] - The shift towards horizontal collaboration in the supply chain is highlighted as a trend that will replace the traditional vertical integration model [3][4] - The development of large language models (LLM) for mobile AI by DeepSeek is expected to lead to standardization of AI technologies by companies like OpenAI [3]
谷歌大肆招人,开发网卡芯片
半导体行业观察· 2025-03-05 01:03
《环球报》获悉,谷歌正在扩大其在以色列的芯片开发业务,目前正招聘数十名员工开发一种新型 通信芯片——网络接口卡(NIC),这是用于 AI 处理的核心处理器和图形处理器之间通信的基本组 件。此类芯片目前由博通、英特尔和 Nvidia 生产。 价格昂贵的商品 直到最近,谷歌开发网卡还被认为是一项简单的任务,但随着人工智能处理操作的激增,该组件对 于大型数据中心图形处理器之间的通信至关重要。它的价格一路飙升,"NIC"已成为一种有价值的商 品,Nvidia 以特别高的价格出售,作为由以色列公司 Mellanox 开发的一套通信芯片的一部分,该 公司五年前被 Nvidia 收购。Nvidia 正试图将其 NIC 组件与其他通信组件一起以看似打折的套餐形 式销售,但这些价格仍然让科技巨头难以盈利地部署人工智能技术。 如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容 编译自globes ,谢谢。 参考链接 四年前,谷歌成立了自己的芯片开发部门,以开发硬件组件,减少对英特尔、Nvidia和博通等外部 供应商的依赖。 在 OpenAI 等领先公司发布开发能够处理下一代语言模型处理任务的芯片的规范后,科技巨头们纷 纷竞相开 ...
速递|字节将为泰国数据中心投资88亿美元,此前马来西亚数据中心或终止对华服务
Z Finance· 2025-02-28 08:06
图片来源: Unsplash 根据路透社报道,TikTok全球公共政策副总裁海伦娜·勒施(Helena Lersch)周五在曼谷的一场活动 上表示, TikTok将在未来五年内向泰国的数据中心投资88亿美元 。目前尚不清楚这笔投资是否包含 泰国投资委员会上个月宣布的38亿美元协议。 近年来,字节跳动越来越多依赖东南亚数据中心,特别是在马来西亚的数据中心。公司计划今年通过 租赁协议等方式进行大规模订单,以增加其海外AI能力。 黑石、贝恩资本、华平投资和泛大西洋投资等美国私募股权公司已投资了数十亿美元在马来西亚经营 数据中心的企业。然而,这些公司的业务 正面临美国对华芯片禁令的反噬 。 自2023年起,美国芯片禁令对华升级了技术封锁,限制中国公司购买英伟达的高性能芯片。 但是, 如果中国公司通过租用海外数据中心的空间,特别是在马来西亚,依然能够合法获利用这些芯片,这 些数据中心内的芯片属于第三方公司。 但美国的新规进一步升级了禁令。中国公司利用海外的计算中心这一通道预计将在今年5月被关闭。 新规不仅禁止中国公司购买英伟达的高端芯片,还禁止它们访问这些技术。 美国前工业与安全副部长艾伦·埃斯特维兹(Alan Est ...