嵌套学习
Search documents
为什么这篇谷歌论文被称为「Attention is all you need」V2
量子位· 2025-12-21 05:45
Core Insights - The article discusses a groundbreaking research paper by Google titled "Nested Learning: The Illusion of Deep Learning Architectures," which is being referred to as "Attention is All You Need" V2, emphasizing a new perspective on AI's learning capabilities [1][5]. Group 1: AI Limitations - Current large language models (LLMs) suffer from a condition termed "digital amnesia," where they forget recently learned information shortly after it is taught [2][3]. - The industry has focused on making models deeper and larger, believing that increasing scale would lead to emergent memory capabilities, but this approach has significant limitations [3][4]. Group 2: Nested Learning Paradigm - The research introduces the concept of "nested learning," which posits that effective intelligent learning requires two orthogonal dimensions: depth (model layers and capacity) and frequency (the rhythm and speed of internal component updates) [9][10]. - The paper argues that mainstream optimizers, traditionally viewed as mere training engines, actually function as associative memory systems that continuously record gradient changes [6]. Group 3: HOPE Architecture - The new architecture proposed, named HOPE, features a continuous memory system with multiple MLP modules arranged like a spectrum, each updating at different frequencies [14]. - This architecture mimics the human brain's memory processes, allowing new knowledge to be integrated without causing systemic collapse or forgetting [17][16]. Group 4: Future Implications - The value of "nested learning" lies not in immediately replacing existing models like Transformers but in providing a new design logic and framework for AI development [18]. - The exploration of memory and learning processes is still in its early stages, suggesting that future AI advancements may require systems capable of learning and evolving rather than being static repositories of knowledge [18].
通信行业周观点:谷歌嵌套学习架构革新,Claude Opus4.5高性价比-20251202
Changjiang Securities· 2025-12-02 09:42
Investment Rating - The report maintains a "Positive" investment rating for the communication industry [9]. Core Insights - The communication sector saw an increase of 8.71% in the 48th week of 2025, ranking first among major industries in the Yangtze River region. Year-to-date, the sector has risen by 64.42%, also ranking first [2][4]. - Google's introduction of the Nested Learning theory and HOPE architecture significantly enhances long-term memory and reasoning efficiency, addressing the memory bottlenecks of traditional Transformers in long sequences, which can greatly reduce training and inference costs [5][7]. - Anthropic's Claude Opus 4.5 has achieved state-of-the-art performance in software engineering, with aggressive pricing strategies that lower input and output costs by 67%, while also integrating deeply with existing office workflows [6][7]. Summary by Sections Market Performance - The communication sector's performance in the 48th week of 2025 was highlighted, with notable individual stock performances, including Guangku Technology (+39.2%), Tongyu Communication (+39.1%), and Taicheng Light (+22.3%) [4]. Technological Advancements - Google's Nested Learning paradigm optimizes memory processes by breaking down large models into nested sub-optimization problems, enhancing memory management and reducing inference costs [5]. - The HOPE architecture, based on this theory, separates high-frequency and low-frequency memory tasks, improving efficiency in long-sequence processing [5]. Product Launches - Anthropic's Claude Opus 4.5 supports a context window of approximately 200k and has outperformed competitors in software engineering tasks, with a significant reduction in usage costs for enterprises [6]. Investment Recommendations - The report recommends several companies across various segments, including: - Telecom Operators: China Mobile, China Telecom, China Unicom - Optical Modules: Zhongji Xuchuang, Xinyi Sheng, Tianfu Communication - AI Applications: Boshi Jie, Heertai, Tuobang Co., Yiyuan Communication - Satellite Applications: Huace Navigation, Haige Communication, Canqin Technology [7].
LLM 语境下,「持续学习」是否是 「记忆」 问题的最优解?
机器之心· 2025-11-16 01:30
Group 1 - The article discusses the concept of "Nested Learning" proposed by Google, which aims to address the memory management issues in LLMs (Large Language Models) and the challenges of catastrophic forgetting [5][6][8] - Nested Learning is presented as a multi-layered optimization problem, where models are seen as a series of interconnected sub-problems, allowing for the simultaneous learning of new skills while avoiding the loss of previously acquired knowledge [6][7] - The research introduces the "Continuous Memory System" (CMS), which treats memory as a system of multiple modules that update at different frequencies, enhancing the model's ability to manage memory effectively [6][7] Group 2 - The article highlights the importance of improving LLMs' memory capabilities to enable continual learning, allowing AI to retain contextual experiences, semantic knowledge, and procedural skills [8] - A proposed three-layer memory architecture includes Model Weights for general knowledge, KV Cache for intermediate results, and Context for relevant background information, facilitating appropriate responses from the model [8]
Jeff Dean盛赞姚班校友AI新研究,目前人已到Meta
量子位· 2025-11-15 05:00
Core Viewpoint - The article discusses a new paradigm in AI called Nested Learning (NL), which addresses the issue of catastrophic forgetting in large language models and proposes a more efficient learning structure that mimics human cognitive processes [2][10][25]. Summary by Sections Nested Learning Concept - Nested Learning transforms models from a flat computational network to a hierarchical, self-adjusting learning system, inspired by the human brain's memory processes [6][12][14]. - Traditional models like Transformers are seen as simplified versions of NL, lacking the multi-level advantages that NL offers [6][14]. Innovations of Nested Learning - The research team introduced three core innovations based on NL: 1. **Deep Optimizer**: Unlike traditional optimizers, NL's deep optimizer uses a pre-processing mechanism to understand gradient properties and employs MLP neural networks for memory, allowing for flexible parameter adjustments [17][18]. 2. **Self-Modifying Model**: This allows models to autonomously learn how to adjust their parameters during training, adapting to new data without manual intervention [19]. 3. **Continuous Memory System**: Upgrades the traditional short-term/long-term memory structure to a multi-scale memory chain, enabling efficient storage and processing of information [20]. Performance of Hope Model - The Hope model, based on NL, significantly outperforms mainstream baseline models like Transformer, RetNet, and DeltaNet in language modeling and common-sense reasoning tasks, demonstrating lower perplexity and higher accuracy across various metrics [8][23][24]. - For instance, in language modeling tasks, Hope achieved a perplexity of 26.05 with 760M parameters, outperforming other models [24]. Implications of Nested Learning - The introduction of NL represents a paradigm shift in deep learning, moving away from the traditional approach of stacking layers and parameters, and instead leveraging cognitive science to create a collaborative, hierarchical intelligence system [25]. - This new paradigm may enable AI to continuously learn and accumulate knowledge like humans, potentially solving key challenges in long-context reasoning and lifelong learning [25].