Workflow
持续学习
icon
Search documents
破解可塑性瓶颈,清华团队新作刷榜持续学习:可迁移任务关系指导训练
3 6 Ke· 2025-12-02 00:56
Core Insights - Tsinghua University's research team has proposed a novel continual learning (CL) framework called H-embedding guided hypernetwork, which addresses the issue of catastrophic forgetting in AI models by focusing on task relationships [1][4][21] - The framework aims to enhance the model's ability to absorb new knowledge while maintaining performance on old tasks, thus facilitating long-term intelligence in AI systems [1][21] Group 1: Problem Identification - Catastrophic forgetting is a significant bottleneck in the practical application of continual learning, where models forget old knowledge when learning new tasks [1][4] - Existing CL methods primarily adopt a model-centric approach, neglecting the intrinsic relationships between tasks, which directly influence knowledge transfer efficiency [1][8] Group 2: Proposed Solution - The H-embedding guided hypernetwork framework introduces a task-relation-centric approach, constructing transferable task embeddings (H-embedding) before learning new tasks [4][6] - This method allows for explicit encoding of task relationships in the CL process, enabling the model to manage knowledge transfer more effectively [6][21] Group 3: Methodology - H-embedding is derived from H-score, which quantifies the transfer value from old tasks to current tasks, facilitating efficient computation of transferability [9][11] - The framework employs a hypernetwork to generate task-specific parameters based on the H-embedding, allowing for automatic adjustment of parameters according to task differences [12][17] Group 4: Experimental Results - The proposed framework has shown superior performance across multiple CL benchmarks, including CIFAR-100, ImageNet-R, and DomainNet, demonstrating its robustness and scalability [18][20] - The model exhibits strong forward and backward transfer capabilities, with minimal interference from new tasks on old tasks, and effectively absorbs knowledge from previous tasks [20] Group 5: Future Directions - The research indicates potential applications of task-structure-aware methods in cross-modal incremental learning, long-term task adaptation for large models, and automated learning sequence planning [21][23] - This approach aims to contribute to the development of more scalable and adaptable general AI systems [21]
万亿级 AI 赌注之后,Ilya Sutskever:只堆算力和肯做研究,结果会差多远?
3 6 Ke· 2025-11-26 01:02
Core Insights - The global AI spending is projected to approach $1.5 trillion by 2025 and exceed $2 trillion by 2026, with Nvidia's CEO estimating that infrastructure investments in AI could reach $3 to $4 trillion this decade, marking a new industrial revolution [1][34] - The AI industry is transitioning from an era focused on scaling resources to one centered on research and innovation, as highlighted by Ilya Sutskever, the former chief scientist of OpenAI [2][5][6] Group 1: Transition in AI Development - The era of simply scaling parameters, compute power, and data is coming to an end, as the industry consensus has led to a resource arms race rather than true innovation [7][9] - Sutskever emphasizes that the future of AI will depend on new training methods rather than just increasing GPU counts, indicating a shift in competitive advantage [7][12] Group 2: Limitations of Current Models - Current large models exhibit high benchmark scores but often fail to deliver real economic value, revealing a disconnect between perceived capability and practical application [9][10] - The models are criticized for their lack of generalization ability, often performing well in tests but struggling with real-world tasks due to systemic flaws in their training processes [11][16] Group 3: Need for New Training Approaches - Sutskever argues that existing training methods, including pre-training and reinforcement learning, have fundamental limitations that prevent models from truly understanding and applying knowledge [18][20] - The focus should shift towards continuous learning and self-evaluation, allowing models to adapt and improve in real-world scenarios rather than being static after initial training [27][29] Group 4: Safety and Alignment in AI - The concept of safety in AI should be integrated from the training phase, as the ability to generalize and understand context is crucial for reliable performance in unknown situations [25][26] - Sutskever's new approach advocates for a model that can learn continuously and align with human values, moving away from a one-time training paradigm [28][30] Group 5: Implications for the Future of AI - The shift from resource-based competition to method-based innovation is seen as a critical turning point in the AI industry, with research capabilities becoming the key differentiator [33] - The current evaluation systems are evolving, as the focus on merely increasing model size and parameters is proving insufficient for addressing the complexities of AI deployment [33]
LLM 语境下,「持续学习」是否是 「记忆」 问题的最优解?
机器之心· 2025-11-16 01:30
Group 1 - The article discusses the concept of "Nested Learning" proposed by Google, which aims to address the memory management issues in LLMs (Large Language Models) and the challenges of catastrophic forgetting [5][6][8] - Nested Learning is presented as a multi-layered optimization problem, where models are seen as a series of interconnected sub-problems, allowing for the simultaneous learning of new skills while avoiding the loss of previously acquired knowledge [6][7] - The research introduces the "Continuous Memory System" (CMS), which treats memory as a system of multiple modules that update at different frequencies, enhancing the model's ability to manage memory effectively [6][7] Group 2 - The article highlights the importance of improving LLMs' memory capabilities to enable continual learning, allowing AI to retain contextual experiences, semantic knowledge, and procedural skills [8] - A proposed three-layer memory architecture includes Model Weights for general knowledge, KV Cache for intermediate results, and Context for relevant background information, facilitating appropriate responses from the model [8]
突破LLM遗忘瓶颈,谷歌「嵌套学习」让AI像人脑一样持续进化
机器之心· 2025-11-08 06:10
Core Insights - Google has introduced a new machine learning paradigm called Nested Learning, which allows models to continuously learn new skills without forgetting old ones, marking a significant advancement towards AI that evolves like the human brain [1][3][4]. Group 1: Nested Learning Concept - Nested Learning treats machine learning models as a series of interconnected optimization sub-problems, enabling a more efficient learning system [6][11]. - The approach bridges the gap between model architecture and optimization algorithms, suggesting they are fundamentally the same and can be organized into hierarchical optimization systems [7][16]. - This paradigm allows for different components of a model to update at varying frequencies, enhancing the model's ability to manage long-term and short-term memory [15][20]. Group 2: Implementation and Architecture - Google has developed a self-modifying architecture called Hope, based on Nested Learning principles, which outperforms existing models in language modeling and long-context memory management [8][24]. - Hope is an evolution of the Titans architecture, designed to execute infinite levels of contextual learning and optimize its memory through a self-referential process [24][26]. Group 3: Experimental Results - Evaluations show that Hope exhibits lower perplexity and higher accuracy in various language modeling and common-sense reasoning tasks compared to other architectures [27][30]. - The performance of different architectures, including Hope, Titans, and others, was compared in long-context tasks, demonstrating the effectiveness of the Nested Learning framework [30]. Group 4: Future Implications - Nested Learning provides a theoretical and practical foundation for bridging the gap between current LLMs' limitations and the superior continuous learning capabilities of the human brain, paving the way for the development of self-improving AI [30].
Meta拆掉AI持续学习路上的最大炸弹,“微调”又有了一战之力
3 6 Ke· 2025-10-27 05:13
Core Insights - The article discusses the recent advancements in large language models (LLMs) regarding their ability to achieve continual learning and self-evolution, addressing criticisms about their lack of genuine learning capabilities [1][2]. Group 1: Paths to Continual Learning - The ability of LLMs to learn continuously is fundamentally linked to their memory depth and plasticity, with three main paths identified for enhancing this capability [2]. - The first path involves modifying the "context" or "working memory" of the model through In-Context Learning (ICL), where new information is provided in prompts to help the model learn to solve specific problems [4][6]. - The second path introduces an "external memory bank" (RAG), allowing models to access and maintain an external database for comparison and retrieval, exemplified by Google's DeepMind's "Reasoningbank" [7]. - The third path focuses on parameter-level continual learning, which has faced challenges due to the complexities and instabilities associated with methods like Reinforcement Learning (RL) and Low-Rank Adaptation (LoRA) [10][11]. Group 2: Sparse Memory Fine-Tuning - Meta AI's recent paper introduces Sparse Memory Fine-Tuning (SFT) as a solution to the challenges of traditional SFT, particularly addressing the issue of catastrophic forgetting [11][28]. - The proposed method involves a three-step process: modifying the architecture to include a memory layer, using TF-IDF to identify which parameters to update, and performing sparse updates to only the most relevant parameters [12][22][23]. - This new approach has shown significant improvements, with models experiencing only an 11% drop in performance on original tasks after learning new facts, compared to 71% and 89% drops with LoRA and full fine-tuning, respectively [23][25]. Group 3: Implications for the Future of LLMs - The advancements in SFT suggest a potential shift in how models can be updated safely and effectively, moving away from static tools to dynamic agents capable of continuous learning [31][32]. - The successful implementation of these methods could mark the beginning of a new era for self-evolving models, aligning with the vision of models that grow and adapt through experience [31][32].
96.0%受访职场青年认为工作后更应注重个人成长
Core Insights - A significant 96.0% of surveyed young professionals believe that personal growth should be prioritized after entering the workforce, emphasizing the importance of continuous learning for career advancement [1][2][5] Group 1: Importance of Continuous Learning - Continuous learning is viewed as essential for career development, with professionals acknowledging that the knowledge gained post-graduation is crucial for determining future career paths [2][4] - 54.8% of respondents feel that ongoing self-learning allows them to perform more confidently at work, while 47.1% report increased self-confidence and a sense of achievement [5] Group 2: Areas of Focus for Growth - The survey indicates that 70.9% of young professionals prioritize enhancing their professional skills, followed by 68.0% focusing on work-related tasks, and 53.4% on interpersonal communication [3][5] - Other areas of interest include financial literacy (41.7%), time management (41.1%), and personal development (39.9%) [3] Group 3: Personal Experiences and Outcomes - Professionals report that engaging in continuous learning has led to a more fulfilling daily routine and increased self-confidence, with many feeling more equipped to handle workplace challenges [4][5] - The pursuit of personal interests, such as hobbies and skills outside of work, is also seen as beneficial for overall well-being and career satisfaction [4]
大佬开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
自动驾驶之心· 2025-10-22 00:03
Core Insights - The article discusses the current state and future of AI, particularly focusing on the limitations of reinforcement learning and the timeline for achieving Artificial General Intelligence (AGI) [5][6][10]. Group 1: AGI and AI Development - AGI is expected to take about ten years to develop, contrary to the belief that this year would be the year of agents [12][13]. - Current AI agents, such as Claude and Codex, are impressive but still lack essential capabilities, including multi-modal abilities and continuous learning [13][14]. - The industry has been overly optimistic about the pace of AI development, leading to inflated expectations [12][15]. Group 2: Limitations of Reinforcement Learning - Reinforcement learning is criticized as being inadequate for replicating human learning processes, as it often relies on trial and error without a deep understanding of the problem [50][51]. - The approach of reinforcement learning can lead to noise in the learning process, as it weights every action based on the final outcome rather than the quality of the steps taken [51][52]. - Human learning involves a more complex reflection on successes and failures, which current AI models do not replicate [52][53]. Group 3: Future of AI and Learning Mechanisms - The future of AI may involve more sophisticated attention mechanisms and learning algorithms that better mimic human cognitive processes [33][32]. - There is a need for AI models to develop mechanisms for long-term memory and knowledge retention, which are currently lacking [31][32]. - The integration of AI into programming and development processes is seen as a continuous evolution rather than a sudden leap to superintelligence [45][47].
Andrej Karpathy 开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
机器之心· 2025-10-18 05:44
Core Viewpoint - AI is projected to contribute an annual GDP increase of 2%, but the current state of the industry is criticized for being overly optimistic and disconnected from reality [2][5]. Group 1: AGI and Learning - AGI is expected to take about ten years to develop, as current AI agents lack the necessary cognitive abilities and continuous learning capabilities [9][11]. - Current AI models, particularly large language models (LLMs), exhibit cognitive deficiencies that hinder their performance [34][36]. - The concept of reinforcement learning is deemed inadequate for replicating human learning processes, as it oversimplifies the complexity of human decision-making [44][46]. Group 2: AI Development and Challenges - The industry is experiencing a phase of rapid development, but there is skepticism about the actual capabilities of AI models, which are often overhyped [5][41]. - Current AI agents struggle with understanding and integrating unique coding implementations, leading to inefficiencies and misunderstandings in code generation [36][41]. - The reliance on pre-trained models and the limitations of current AI tools highlight the need for further advancements in AI technology [20][42]. Group 3: Future of AI - The future of AI is expected to involve more sophisticated attention mechanisms and potentially a shift towards more efficient learning algorithms [29][30]. - There is a belief that while AI will continue to evolve, it will still rely on foundational principles such as gradient descent for training large neural networks [29][30]. - The ongoing improvements in AI tools and models suggest a continuous integration of new techniques and methodologies to enhance performance [42][43].
《大模型的第一性思考》李建忠对话GPT5与Transformer发明者Lukasz Kaiser实录
3 6 Ke· 2025-10-13 10:46
Core Insights - The rapid development of large intelligent systems is reshaping industry dynamics, exemplified by OpenAI's recent release of Sora 2, which showcases advancements in model capabilities and the complexity of AI evolution [1][2] - The dialogue between industry leaders, including CSDN's Li Jianzhong and OpenAI's Lukasz Kaiser, focuses on foundational thoughts regarding large models and their implications for future AI development [2][5] Group 1: Language and Intelligence - Language plays a crucial role in AI, with some experts arguing that relying solely on language models for AGI is misguided, as language is a low-bandwidth representation of the physical world [6][9] - Kaiser emphasizes the importance of temporal dimensions in language, suggesting that the ability to generate sequences over time is vital for expressing intelligence [7][9] - The conversation highlights that while language models can form abstract concepts, they may not fully align with human concepts, particularly regarding physical experiences [11][12] Group 2: Multimodal Models and World Understanding - The industry trend is towards unified models that can handle multiple modalities, but current models like GPT-4 already demonstrate significant multimodal capabilities [12][13] - Kaiser acknowledges that while modern language models can process multimodal tasks, the integration of different modalities remains a challenge [13][15] - The discussion raises skepticism about whether AI can fully understand the physical world through observation alone, suggesting that language models may serve as effective world models in certain contexts [14][15] Group 3: AI Programming and Future Perspectives - AI programming is emerging as a key application of large language models, with two main perspectives on its future: one advocating for natural language as the primary programming interface and the other emphasizing the continued need for traditional programming languages [17][18] - Kaiser believes that language models will increasingly cover programming tasks, but a solid understanding of programming concepts will remain essential for professional developers [19][20] Group 4: Agent Models and Generalization Challenges - The concept of "agent models" in AI training faces challenges in generalizing to new tasks, raising questions about whether this is due to training methods or inherent limitations [21][22] - Kaiser suggests that the effectiveness of agent systems relies on their ability to learn from interactions with various tools and environments, which is currently limited [22][23] Group 5: Scaling Laws and Computational Limits - The belief in Scaling Laws as the key to stronger AI raises concerns about potential over-reliance on computational power at the expense of algorithmic and architectural advancements [24][25] - Kaiser differentiates between pre-training and reinforcement learning Scaling Laws, indicating that while pre-training has been effective, it may be approaching economic limits [25][26] Group 6: Embodied Intelligence and Data Efficiency - The slow progress in embodied intelligence, particularly in humanoid robots, is attributed to either data scarcity or fundamental differences between bits and atoms [29][30] - Kaiser argues that advancements in data efficiency and the development of multimodal models will be crucial for achieving effective embodied intelligence [30][31] Group 7: Reinforcement Learning and Scientific Discovery - The shift towards reinforcement learning-driven reasoning models presents both opportunities for innovation and challenges related to their effectiveness in generating new scientific insights [32][33] - Kaiser notes that while reinforcement learning offers high data efficiency, it has limitations compared to traditional gradient descent methods [33][34] Group 8: Organizational Collaboration and Future Models - Achieving large-scale collaboration among agents remains a significant challenge, with the need for more parallel processing and effective feedback mechanisms in training [35][36] - Kaiser emphasizes the necessity for next-generation reasoning models that can operate in a more parallel and efficient manner to facilitate organizational collaboration [36][37] Group 9: Memory Mechanisms in AI - Current AI models' memory capabilities are limited by context windows, resembling working memory rather than true long-term memory [37][38] - Kaiser suggests that future architectures may need to incorporate more sophisticated memory mechanisms to achieve genuine long-term memory capabilities [38][39] Group 10: Continuous Learning in AI - The potential for AI models to support continuous learning is being explored, with current models utilizing context as a form of ongoing memory [39][40] - Kaiser believes that while context learning is a step forward, more elegant solutions for continuous learning will be necessary in the future [40][41]
Want to Win in Any Industry? Grant Cardone Says You Need These 4 Things
Yahoo Finance· 2025-09-23 15:16
Core Insights - The article outlines four essential traits for success in any industry, emphasizing the importance of commitment to these traits over time Group 1: Desire to Succeed - The first trait necessary for success is the desire to succeed, which helps individuals push through challenges in building a business [2] - A strong desire to succeed facilitates the incorporation of the other three traits, making it crucial to have this motivation before starting a business [3] Group 2: Willingness to Learn - The second trait is the willingness to learn, which involves educating oneself about the chosen industry through various resources such as books, videos, and podcasts [4] - It is important to learn not only about the industry but also about the business aspects, as the skills required for running a business differ significantly from those of a hobbyist [5] - Continuous learning is essential, even during successful times, to discover new revenue-generating opportunities [6] Group 3: Ability to Never Quit - The third trait is the ability to never quit, which is vital once a lucrative opportunity is identified [7] - While it is acceptable to walk away from unproductive ventures, persistence in pursuing goals is crucial, especially during challenging times [8]