HOPE架构
Search documents
Transformer已死?DeepMind正在押注另一条AGI路线
3 6 Ke· 2026-01-09 02:42
借鉴人类联想记忆,嵌套学习让AI在运行中构建抽象结构,超越Transformer的局限。谷歌团队强调:优化器与架构互为上下文,协同进化才 能实现真正持续学习。这篇论文或成经典,开启AI从被动训练到主动进化的大门。 「灾难性遗忘」,一个困扰了AI界几十年的幽灵,这一次或许被彻底解决了。 过去一年,AI突飞猛进,绝非夸张的修辞,仅谷歌DeepMind一年的成就,就让人眼花缭乱: 但如果DeepMind要选2025年最重要的研究或产品,那最近火爆的嵌套学习「Nested Learning」必有一席之地。 有网友读过论文之后,发帖表示,这篇论文就是《Attention is All you Need》的「续集」。 如果Transformer开启了Scaling时代,那么嵌套学习,可能正在开启真正的AGI时代。 DeepMind创始人Shane Legg更直接,AGI一路坦途,最新进展就是嵌套学习。 甚至有网友表示,如果要给未来的外星人留一篇论文,必然是这篇《嵌套学习》。 JT Investing @ @JLTinvesting · Nov 27, 2025 What we are seeing today with ...
为什么这篇谷歌论文被称为「Attention is all you need」V2
量子位· 2025-12-21 05:45
失忆的巨人 非羊 发自 凹非寺 量子位 | 公众号 QbitAI 从小老师就爱说"好记性不如烂笔头",那么我们为什么不给有"记忆缺陷"的大模型配一个小本本记上总结归纳的要点呢? 继著名的"Attention Is All You Need"之后,谷歌新论文再度引爆圈内: 我们可能忽略了AI的"另一半大脑" 。 这篇文章题为 嵌套学习:深度学习架构的幻象 (Nested Learning: The Illusion of Deep Learning Architectures) 在圈内被誉为是"Attention is all you need"V2 你是否曾对AI感到一丝"恨铁不成钢"?你刚刚在对话中详细解释过一个概念,三句话之后它就可能完全遗忘,仿佛从未发生。ChatGPT们上知 天文下知地理,却学不会你今天刚教它的一件小事。 这并非偶然的Bug,而是当前所有大型语言模型 (LLMs) 共同的"先天疾病"—— 数字失忆症 。 为了"治疗"它,过去十年,整个行业几乎只遵循一条黄金定律: 把模型做得更深、更大 。我们不断堆叠Transformer层,追逐万亿参数,相 信"规模即智能",期待着记忆相关的能力也能" ...
通信行业周观点:谷歌嵌套学习架构革新,Claude Opus4.5高性价比-20251202
Changjiang Securities· 2025-12-02 09:42
Investment Rating - The report maintains a "Positive" investment rating for the communication industry [9]. Core Insights - The communication sector saw an increase of 8.71% in the 48th week of 2025, ranking first among major industries in the Yangtze River region. Year-to-date, the sector has risen by 64.42%, also ranking first [2][4]. - Google's introduction of the Nested Learning theory and HOPE architecture significantly enhances long-term memory and reasoning efficiency, addressing the memory bottlenecks of traditional Transformers in long sequences, which can greatly reduce training and inference costs [5][7]. - Anthropic's Claude Opus 4.5 has achieved state-of-the-art performance in software engineering, with aggressive pricing strategies that lower input and output costs by 67%, while also integrating deeply with existing office workflows [6][7]. Summary by Sections Market Performance - The communication sector's performance in the 48th week of 2025 was highlighted, with notable individual stock performances, including Guangku Technology (+39.2%), Tongyu Communication (+39.1%), and Taicheng Light (+22.3%) [4]. Technological Advancements - Google's Nested Learning paradigm optimizes memory processes by breaking down large models into nested sub-optimization problems, enhancing memory management and reducing inference costs [5]. - The HOPE architecture, based on this theory, separates high-frequency and low-frequency memory tasks, improving efficiency in long-sequence processing [5]. Product Launches - Anthropic's Claude Opus 4.5 supports a context window of approximately 200k and has outperformed competitors in software engineering tasks, with a significant reduction in usage costs for enterprises [6]. Investment Recommendations - The report recommends several companies across various segments, including: - Telecom Operators: China Mobile, China Telecom, China Unicom - Optical Modules: Zhongji Xuchuang, Xinyi Sheng, Tianfu Communication - AI Applications: Boshi Jie, Heertai, Tuobang Co., Yiyuan Communication - Satellite Applications: Huace Navigation, Haige Communication, Canqin Technology [7].
LLM 语境下,「持续学习」是否是 「记忆」 问题的最优解?
机器之心· 2025-11-16 01:30
Group 1 - The article discusses the concept of "Nested Learning" proposed by Google, which aims to address the memory management issues in LLMs (Large Language Models) and the challenges of catastrophic forgetting [5][6][8] - Nested Learning is presented as a multi-layered optimization problem, where models are seen as a series of interconnected sub-problems, allowing for the simultaneous learning of new skills while avoiding the loss of previously acquired knowledge [6][7] - The research introduces the "Continuous Memory System" (CMS), which treats memory as a system of multiple modules that update at different frequencies, enhancing the model's ability to manage memory effectively [6][7] Group 2 - The article highlights the importance of improving LLMs' memory capabilities to enable continual learning, allowing AI to retain contextual experiences, semantic knowledge, and procedural skills [8] - A proposed three-layer memory architecture includes Model Weights for general knowledge, KV Cache for intermediate results, and Context for relevant background information, facilitating appropriate responses from the model [8]