嵌套学习 - filings, earnings calls, financial reports, news

嵌套学习

Search documents

为什么这篇谷歌论文被称为「Attention is all you need」V2

量子位· 2025-12-21 05:45

失忆的巨人非羊发自凹非寺量子位 | 公众号 QbitAI 从小老师就爱说"好记性不如烂笔头"，那么我们为什么不给有"记忆缺陷"的大模型配一个小本本记上总结归纳的要点呢？继著名的"Attention Is All You Need"之后，谷歌新论文再度引爆圈内：我们可能忽略了AI的"另一半大脑" 。这篇文章题为嵌套学习：深度学习架构的幻象（Nested Learning: The Illusion of Deep Learning Architectures）在圈内被誉为是"Attention is all you need"V2 你是否曾对AI感到一丝"恨铁不成钢"？你刚刚在对话中详细解释过一个概念，三句话之后它就可能完全遗忘，仿佛从未发生。ChatGPT们上知天文下知地理，却学不会你今天刚教它的一件小事。这并非偶然的Bug，而是当前所有大型语言模型（LLMs）共同的"先天疾病"—— 数字失忆症。为了"治疗"它，过去十年，整个行业几乎只遵循一条黄金定律：把模型做得更深、更大。我们不断堆叠Transformer层，追逐万亿参数，相信"规模即智能"，期待着记忆相关的能力也能" ...

通信行业周观点：谷歌嵌套学习架构革新，Claude Opus4.5高性价比-20251202

Changjiang Securities· 2025-12-02 09:42

Investment Rating - The report maintains a "Positive" investment rating for the communication industry [9]. Core Insights - The communication sector saw an increase of 8.71% in the 48th week of 2025, ranking first among major industries in the Yangtze River region. Year-to-date, the sector has risen by 64.42%, also ranking first [2][4]. - Google's introduction of the Nested Learning theory and HOPE architecture significantly enhances long-term memory and reasoning efficiency, addressing the memory bottlenecks of traditional Transformers in long sequences, which can greatly reduce training and inference costs [5][7]. - Anthropic's Claude Opus 4.5 has achieved state-of-the-art performance in software engineering, with aggressive pricing strategies that lower input and output costs by 67%, while also integrating deeply with existing office workflows [6][7]. Summary by Sections Market Performance - The communication sector's performance in the 48th week of 2025 was highlighted, with notable individual stock performances, including Guangku Technology (+39.2%), Tongyu Communication (+39.1%), and Taicheng Light (+22.3%) [4]. Technological Advancements - Google's Nested Learning paradigm optimizes memory processes by breaking down large models into nested sub-optimization problems, enhancing memory management and reducing inference costs [5]. - The HOPE architecture, based on this theory, separates high-frequency and low-frequency memory tasks, improving efficiency in long-sequence processing [5]. Product Launches - Anthropic's Claude Opus 4.5 supports a context window of approximately 200k and has outperformed competitors in software engineering tasks, with a significant reduction in usage costs for enterprises [6]. Investment Recommendations - The report recommends several companies across various segments, including: - Telecom Operators: China Mobile, China Telecom, China Unicom - Optical Modules: Zhongji Xuchuang, Xinyi Sheng, Tianfu Communication - AI Applications: Boshi Jie, Heertai, Tuobang Co., Yiyuan Communication - Satellite Applications: Huace Navigation, Haige Communication, Canqin Technology [7].

LLM 语境下，「持续学习」是否是「记忆」问题的最优解？

机器之心· 2025-11-16 01:30

Group 1 - The article discusses the concept of "Nested Learning" proposed by Google, which aims to address the memory management issues in LLMs (Large Language Models) and the challenges of catastrophic forgetting [5][6][8] - Nested Learning is presented as a multi-layered optimization problem, where models are seen as a series of interconnected sub-problems, allowing for the simultaneous learning of new skills while avoiding the loss of previously acquired knowledge [6][7] - The research introduces the "Continuous Memory System" (CMS), which treats memory as a system of multiple modules that update at different frequencies, enhancing the model's ability to manage memory effectively [6][7] Group 2 - The article highlights the importance of improving LLMs' memory capabilities to enable continual learning, allowing AI to retain contextual experiences, semantic knowledge, and procedural skills [8] - A proposed three-layer memory architecture includes Model Weights for general knowledge, KV Cache for intermediate results, and Context for relevant background information, facilitating appropriate responses from the model [8]

Artificial Intelligence

Artificial Intelligence

HOPE架构

Jeff Dean盛赞姚班校友AI新研究，目前人已到Meta

量子位· 2025-11-15 05:00

Core Viewpoint - The article discusses a new paradigm in AI called Nested Learning (NL), which addresses the issue of catastrophic forgetting in large language models and proposes a more efficient learning structure that mimics human cognitive processes [2][10][25]. Summary by Sections Nested Learning Concept - Nested Learning transforms models from a flat computational network to a hierarchical, self-adjusting learning system, inspired by the human brain's memory processes [6][12][14]. - Traditional models like Transformers are seen as simplified versions of NL, lacking the multi-level advantages that NL offers [6][14]. Innovations of Nested Learning - The research team introduced three core innovations based on NL: 1. **Deep Optimizer**: Unlike traditional optimizers, NL's deep optimizer uses a pre-processing mechanism to understand gradient properties and employs MLP neural networks for memory, allowing for flexible parameter adjustments [17][18]. 2. **Self-Modifying Model**: This allows models to autonomously learn how to adjust their parameters during training, adapting to new data without manual intervention [19]. 3. **Continuous Memory System**: Upgrades the traditional short-term/long-term memory structure to a multi-scale memory chain, enabling efficient storage and processing of information [20]. Performance of Hope Model - The Hope model, based on NL, significantly outperforms mainstream baseline models like Transformer, RetNet, and DeltaNet in language modeling and common-sense reasoning tasks, demonstrating lower perplexity and higher accuracy across various metrics [8][23][24]. - For instance, in language modeling tasks, Hope achieved a perplexity of 26.05 with 760M parameters, outperforming other models [24]. Implications of Nested Learning - The introduction of NL represents a paradigm shift in deep learning, moving away from the traditional approach of stacking layers and parameters, and instead leveraging cognitive science to create a collaborative, hierarchical intelligence system [25]. - This new paradigm may enable AI to continuously learn and accumulate knowledge like humans, potentially solving key challenges in long-context reasoning and lifelong learning [25].

嵌套学习

大语言模型灾难性遗忘

Artificial Intelligence

Artificial Intelligence

Nested Learning（NL）

Hope模型