NSA - filings, earnings calls, financial reports, news

NSA

Search documents

刚刚，DeepSeek梁文锋NSA论文、北大杨耀东团队摘得ACL 2025最佳论文

3 6 Ke· 2025-07-31 03:40

Core Insights - The ACL conference, a leading event in computational linguistics and natural language processing (NLP), is set to take place in Vienna, Austria, from July 27 to August 1, 2025, marking its 63rd edition [1] - This year's conference saw a record number of submissions, exceeding 8,000 papers compared to 4,407 last year, with acceptance rates of 20.3% for main conference papers and 16.7% for findings [3] - Over half of the first authors of the submitted papers are from China (51.3%), a significant increase from 30.6% last year, while the second-largest group comes from the United States (14.0%) [3] Awards and Recognitions - A total of 4 best papers, 2 best social impact papers, 3 best resource papers, 3 best thematic papers, 26 outstanding papers, 2 best TACL papers, 1 best demo paper, and 47 SAC highlights were awarded this year [5] - The best paper awards were shared between teams from DeepSeek and Peking University, and other notable institutions including CISPA Helmholtz Center for Information Security, TCS Research, Microsoft, Stanford University, and Cornell Tech [8] Notable Papers - The paper "A Theory of Response Sampling in LLMs" explores the heuristic methods guiding sampling in large language models (LLMs) and highlights ethical concerns regarding decision-making biases [11] - "Fairness through Difference Awareness" introduces a framework for measuring group discrimination in LLMs, emphasizing the importance of group difference awareness in various contexts [13] - "Language Models Resist Alignment" reveals that large models possess an inherent elasticity mechanism that makes them resistant to alignment efforts, posing challenges for AI safety and alignment [16][17] - The paper "Native Sparse Attention" presents a new attention mechanism designed for efficient long-context modeling, demonstrating superior performance compared to existing sparse attention methods [24][28] Awards for Specific Papers - The best demo paper award went to "OLMoTrace," which can trace language model outputs back to trillions of training tokens, showcasing a significant advancement in understanding model behavior [32] - The best thematic paper award was given to "MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection," which proposes a new adaptive method for fine-tuning large models with minimal parameters [34] Lifetime Achievement and Service Awards - The ACL Lifetime Achievement Award was presented to Professor Kathy McKeown for her extensive contributions to the field of NLP over 43 years [57][60] - The Distinguished Service Award was awarded to Professor Julia B. Hirschberg for her long-standing service to ACL and contributions to the fields of NLP and speech processing [62]

Artificial Intelligence

Artificial Intelligence

NSA

GPT - 4

刚刚，DeepSeek梁文锋NSA论文、北大杨耀东团队摘得ACL 2025最佳论文

机器之心· 2025-07-30 16:25

Group 1 - The ACL conference is a premier event in the field of computational linguistics and natural language processing, with the 63rd edition scheduled for July 27 to August 1, 2025, in Vienna, Austria [2] - This year, the total number of submissions reached a record high of over 8,000, compared to 4,407 last year, with acceptance rates of 20.3% for main conference papers and 16.7% for Findings [3] - Over half of the first authors of the submitted papers are from China (51.3%), a significant increase from last year's 30.6%, while the second-largest group of authors comes from the United States at 14.0% [4] Group 2 - Four best papers were awarded, including two from teams led by Liang Wenfeng and Yang Yaodong from Peking University, with the other two awarded to teams from CISPA Helmholtz Center for Information Security & TCS Research & Microsoft, and Stanford University & Cornell Tech [6][10] - The first best paper discusses a theory of response sampling in large language models (LLMs), highlighting the ethical concerns arising from biases in decision-making processes influenced by LLMs [11][15] - The second best paper focuses on algorithmic fairness, introducing a framework that emphasizes group discrimination awareness in specific contexts, demonstrating that existing bias mitigation strategies may be counterproductive [16][19] Group 3 - The third best paper reveals a structural inertia mechanism in large models that resists alignment during fine-tuning, indicating that achieving robust alignment is more challenging than previously thought [24][25] - The fourth best paper presents a new hardware-aligned and natively trainable sparse attention mechanism, which significantly improves efficiency in long-context modeling for LLMs [31][40] Group 4 - A total of 26 outstanding papers were recognized, covering various topics such as multilingual summarization, hate speech analysis, and the evaluation of large language models [42] - The best demo paper was awarded to OLMoTrace, a system capable of tracing language model outputs back to trillions of training tokens [46][48] Group 5 - The ACL 2025 conference also recognized two time-tested awards, celebrating foundational papers from 2000 and 2015 that have significantly influenced the field [65][73] - Kathy McKeown received the Lifetime Achievement Award for her extensive contributions to natural language processing over 43 years [86][90] - Julia B. Hirschberg was awarded the Distinguished Service Award for her long-standing service to the ACL and contributions to the field [96][98]

自然语言处理

大语言模型

Artificial Intelligence

NSA

自然语言处理

大语言模型

Artificial Intelligence

NSA

ICML 2025 | 千倍长度泛化！蚂蚁新注意力机制GCA实现16M长上下文精准理解

机器之心· 2025-06-13 15:45

Core Viewpoint - The article discusses the challenges of long text modeling in large language models (LLMs) and introduces a new attention mechanism called Grouped Cross Attention (GCA) that enhances the ability to process long contexts efficiently, potentially paving the way for advancements in artificial general intelligence (AGI) [1][2]. Long Text Processing Challenges and Existing Solutions - Long text modeling remains challenging due to the quadratic complexity of the Transformer architecture and the limited extrapolation capabilities of full-attention mechanisms [1][6]. - Existing solutions, such as sliding window attention, sacrifice long-range information retrieval for continuous generation, while other methods have limited generalization capabilities [7][8]. GCA Mechanism - GCA is a novel attention mechanism that learns to retrieve and select relevant past segments of text, significantly reducing memory overhead during long text processing [2][9]. - The mechanism operates in two stages: first, it performs attention on each chunk separately, and then it fuses the information from these chunks to predict the next token [14][15]. Experimental Results - Models incorporating GCA demonstrated superior performance on long text datasets, achieving over 1000 times length generalization and 100% accuracy in 16M long context retrieval tasks [5][17]. - The GCA model's training costs scale linearly with sequence length, and its inference memory overhead approaches a constant, maintaining efficient processing speeds [20][21]. Conclusion - The introduction of GCA represents a significant advancement in the field of long-context language modeling, with the potential to facilitate the development of intelligent agents with permanent memory capabilities [23].

Artificial Intelligence

GCA（Grouped Cross Attention）

Artificial Intelligence

GCA（Grouped Cross Attention）

大模型 “注意力简史”：与两位 AI 研究者从 DeepSeek、Kimi 最新改进聊起

晚点LatePost· 2025-03-02 06:10

嘉宾丨肖朝军、傅天予整理丨程曼祺上周，DeepSeek、Kimi 都放出了新的大模型架构改进和优化成果，分别是 NSA、MoBA。二者都聚焦对大模型中 "注意力机制" 的改进。 o 1 、 R 1 等推理模型的出现，给了长文本新课题。注意力机制是当前大语言模型（LLM）的核心机制。2017 年 6 月那篇开启大语言模型革命的 Transformer 八子论文，标题就是：Attention Is All You Need（注意力就是你所需要的一切）。而优化 Attention 的计算效率和效果，又能帮助解决 AI 学界和业界都非常关心的一个问题，就是长文本（long context）。不管是要一次输入一整本书，让模型能帮我们提炼、理解；还是在生成现在 o1、R1 这类模型需要的长思维链；又或者是希望模型未来能有越来越长的 "记忆"，这都需要长文本能力的支持。这期节目我们邀请了两位做过 Attention 机制改进的 AI 研究者做嘉宾。一位是清华计算机系自然语言处理实验室的博士生肖朝军，他是 InfLLM 注意力机制改进的一作，导师是清华计算机系副教授 ...

Artificial Intelligence

Artificial Intelligence

NSA

计算机行业周度观察-2025-02-26

Guoxin Securities Co., Ltd· 2025-02-26 12:56

市场研究部 2025 年 2 月 24 日计算机行业周度观察本周计算机行业指数表现本周（2.17-2.23）计算机（申万）板块上涨 3.17%，沪深 300 指数上涨 1.00%，计算机板块跑赢沪深 300 2.17 个百分点。和申万其他行业对比，计算机行业涨幅排名位列第 5 位。本周涨幅前 3 名分别为东华软件(+48.88%)、宏景科技(+36.31%)、索辰科技(+31.62%)，跌幅前 3 名分别为上海钢联(-10.03%)、永信至诚 (-10.87%)、新炬网络(-16.47%)。行业动态人工智能华为擎云为代表的新势力在迅速落地自己研发的数字教育解决方案。华为擎云与成都市七中育才学校（东湖校区）携手，成立华为擎云智慧教育自主创新实训室，并成功落地擎云智慧教育-计算机教室解决方案。擎云智慧教育-计算机教室解决方案的又一大亮点是分布式桌面云也是华为擎云计算机教室解决方案的重要特色之一。通过网络同传、运维分析系统，运维人员可以对设备远程集中管理维护，数据来源：Wind，国新证券整理邮箱：wangwenyj@crsec.com.cn 分析师：王闻登记编码：S1490519 ...

月之暗面 MoBA 核心作者自述：一个 “新晋大模型训练师” 的三入思过崖

晚点LatePost· 2025-02-20 14:21

"从开源论文、开源代码出发，现在已经进化到开源思维链了嘛！" 文丨Andrew Lu 注释丨贺乾明程曼祺 2 月 18 日，Kimi 和 DeepSeek 同一天发布新进展，分别是 MoBA 和 NSA，二者都是对 "注意力机制"（Attention Mechanism）的改进。今天，MoBA 的一位主要研发同学 Andrew Lu 在知乎发帖，自述研发过程的三次踩坑，他称为 "三入思过崖"。他在知乎的签名是"新晋 LLM 训练师"。这条回答下的一个评论是："从开源论文、开源代码出发，现在已经进化到开源思维链了嘛。" 注意力机制之所以重要，是因为它是当前大语言模型（LLM）的核心机制。回到 2017 年 6 月那篇开启 LLM 革命的 Transformer 八子论文，标题就是：Attention Is All You Need（注意力就是你所需要的一切），该论文被引用次数至今已达 15.3 万。注意力机制能让 AI 模型像人类一样，知道在处理信息时该 "重点关注" 什么、"忽略" 什么，抓住信息中最关键的部分。在大模型的训练阶段和使用（推理）阶段，注意力机制都会发挥作用。它的大致工作原理是 ...

Artificial Intelligence

Artificial Intelligence

MoBA