Workflow
Gated DeltaNet
icon
Search documents
再谈注意力:阿里、Kimi 都在用的 DeltaNet 和线性注意力新改进丨晚点播客
晚点LatePost· 2025-12-02 09:13
Core Insights - The article discusses advancements in linear attention mechanisms, particularly DeltaNet, which aims to improve the efficiency and effectiveness of large language models (LLMs) by reducing the computational complexity associated with traditional attention mechanisms [5][10][12]. Group 1: Linear Attention Mechanisms - Linear attention mechanisms, such as DeltaNet, were introduced to address the computational bottleneck of traditional attention mechanisms, which exhibit quadratic complexity with respect to input length [5][12]. - DeltaNet's development has been a collaborative effort, with significant contributions from researchers since its inception in 2021, focusing on improving the update rules and parallelization of linear attention [7][20][21]. - The recent open-source releases of Qwen3-Next and Kimi Linear models by Alibaba and Kimi, respectively, incorporate linear attention mechanisms, indicating a shift towards these more efficient models in flagship applications [5][24]. Group 2: DeltaNet and Its Evolution - DeltaNet was initially overlooked due to a lack of key architectural improvements and suboptimal implementations, but recent advancements have led to its increased adoption in industry [20][24]. - The introduction of the Gated DeltaNet variant enhances memory control and retrieval performance, making it more suitable for modern hardware [7][21][24]. - The relationship between DeltaNet and other models, such as Kimi Linear, highlights the trend of integrating linear attention with traditional full attention mechanisms to balance speed and capacity [24][25]. Group 3: Future Directions and Challenges - The article emphasizes the need for further exploration of update rules in linear attention mechanisms, suggesting that improvements in this area could lead to better performance and scalability [48][49]. - There is a discussion on the potential of combining sparse attention with linear attention to address long-text processing challenges, which remains a significant hurdle in current models [46][49]. - The ongoing debate in the industry regarding the effectiveness of linear versus full attention mechanisms reflects the complexities and trade-offs involved in model design for various applications [27][30].
创智突破:AI首次自主发现106个超越人类设计的神经网络架构
机器之心· 2025-07-24 06:50
科学发现还是人类专利吗? 当世界还在为 AI 在数学竞赛中达到金牌水平而惊叹时,一项更加深远的突破正在悄然发生。与解决 IMO 题目这种封闭性问题不同,真正的科学发现是一个开放 性的、长期的认知过程 —— 需要提出原创问题、设计实验方案、观察现象规律、形成科学假设,然后在不断的试错和迭代中逼近真理。 这个过程的复杂度远超任何标准化测试,它要求的不是计算能力,而是真正的科学创新思维。 由创智学院领衔的研究团队今日发布的 AI 超智能系统首次证明,AI 已经具备了进行完整科学发现的能力 —— 该系统在 完全自主的条件下发现了 106 个超越人类 设计的神经网络架构(在多个基准测试中超越了如 Mamba2 和 Gated DeltaNet 等强大的基线模型) ,更恐怖的是,它初步验证了 科学突破可以像训练模型一样 进行工业化量产 。标志着我们正式跨入了长期自主超智能(Long-Horizon Superintelligence)的新纪元, 科 学发现进入 Scaling Law 时代 ! 从数学金牌到科学发现: 认知复杂度的代际跃迁 近期 AI 领域最引人注目的成就之一是在数学竞赛中的突破表现。Google 等研究 ...