灾难性遗忘
Search documents
理想MindGPT-4o-Vision技术报告压缩版
理想TOP2· 2025-12-22 12:28
Core Insights - The article discusses the trade-offs between general capabilities and vertical domain adaptation in the context of transferring general multimodal large models (MLLM) to specific applications, highlighting issues like catastrophic forgetting and the lack of systematic post-training methodologies [2]. Group 1: Key Inefficiencies and Biases in Multimodal Model Training - Three critical inefficiencies are identified: 1. Resource allocation is inefficient, as traditional data synthesis methods treat all data equally, neglecting the differences in information density, leading to underutilization of high-value data and wastage of computational resources [3]. 2. Reward mechanisms can lead to a lack of diversity, where traditional reinforcement learning approaches encourage models to converge on a few safe response patterns, sacrificing output diversity and exploration, which weakens generalization capabilities [3]. 3. Unimodal spurious correlations arise when models overly rely on prior knowledge from language models rather than visual evidence, resulting in factual inaccuracies in industrial applications [3]. Group 2: MindGPT-4ov Post-Training Paradigm - The MindGPT-4ov post-training paradigm consists of four core modules: 1. Data construction based on information density (IDS) and a dual-labeling system [4]. 2. Supervised fine-tuning (SFT) through collaborative curriculum learning [4]. 3. Reinforcement learning (RL) utilizing a hybrid reward system [4]. 4. Infrastructure improvements for parallel training and inference optimization [4]. Group 3: Information Density Score (IDS) and Dynamic Synthesis Strategy - The IDS evaluates image data across four dimensions: subject diversity, spatial relationships, OCR text richness, and world knowledge relevance [4]. - A dynamic synthesis strategy adjusts the number of generated question-answer pairs based on IDS scores, ensuring efficient resource allocation [4]. Group 4: Collaborative Curriculum SFT Mechanism - The SFT mechanism employs a three-stage collaborative curriculum learning approach: 1. Cross-domain knowledge learning focuses on injecting vertical domain knowledge [6]. 2. Capability restoration uses general datasets to recover potential declines in general capabilities [6]. 3. Preference alignment optimizes response formats and reduces hallucinations using high-quality preference data [6]. Group 5: Hybrid Reward Mechanism in Reinforcement Learning - The RL phase introduces multiple reward signals to balance accuracy, diversity, and conciseness: 1. Pass@k rewards encourage exploration by rewarding any correct answer among the top k responses [7]. 2. Diversity rewards penalize semantically similar responses, promoting varied outputs [7]. 3. Length rewards impose penalties for overly lengthy responses, ensuring concise outputs [7]. 4. Adversarial hallucination data is used to penalize models that generate details without visual evidence [7]. Group 6: Label Construction and Data Synthesis - An expert-defined primary label system is expanded into a multi-level label tree to cover both vertical domain knowledge and general visual capabilities [5]. - Data synthesis involves matching images with coarse and fine-grained topics, generating QA pairs based on IDS scores, and filtering low-quality data through a multi-model voting mechanism [8]. Group 7: Performance Validation - MindGPT-4ov demonstrates superior performance in response conciseness, with an average response length significantly shorter than comparative models while maintaining a higher accuracy rate of 83.3% compared to 80.1% [9].
理解 RL学习的本质!
自动驾驶之心· 2025-12-15 00:04
原文链接: https://zhuanlan.zhihu.com/p/1972781108128155202 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 作者 | wangleineo 来源 | 青稞AI >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 最近看了几篇关于RL学习的论文,发现这几篇研究存在着一些内在联系,综合起来看,也许有助于我们理解RL学习方法的本质。 破除迷信 Does RLVR enable LLMs to self-improve? 第一篇文章是最近备受关注的一篇论文,来自清华的LEAP实验室,在今年的NeurIPS拿下了全满分,获得最佳论文奖: https://arxiv.org/abs/2504.13837 这篇论文开宗明义提出了一个问题: RL学习真的能让LLM获得超越基础模型的推理能力吗? 研究结论很确切,不能 。论文用实验证明,RLVR后模型的能力完全在基础模型的能力范围内,只是搜索效率提高了,能更高效 地找到问题的解。而基础模型不能解决的问题,RLVR的模型一样不能解决。 证明 ...
破解可塑性瓶颈,清华团队新作刷榜持续学习:可迁移任务关系指导训练
3 6 Ke· 2025-12-02 00:56
Core Insights - Tsinghua University's research team has proposed a novel continual learning (CL) framework called H-embedding guided hypernetwork, which addresses the issue of catastrophic forgetting in AI models by focusing on task relationships [1][4][21] - The framework aims to enhance the model's ability to absorb new knowledge while maintaining performance on old tasks, thus facilitating long-term intelligence in AI systems [1][21] Group 1: Problem Identification - Catastrophic forgetting is a significant bottleneck in the practical application of continual learning, where models forget old knowledge when learning new tasks [1][4] - Existing CL methods primarily adopt a model-centric approach, neglecting the intrinsic relationships between tasks, which directly influence knowledge transfer efficiency [1][8] Group 2: Proposed Solution - The H-embedding guided hypernetwork framework introduces a task-relation-centric approach, constructing transferable task embeddings (H-embedding) before learning new tasks [4][6] - This method allows for explicit encoding of task relationships in the CL process, enabling the model to manage knowledge transfer more effectively [6][21] Group 3: Methodology - H-embedding is derived from H-score, which quantifies the transfer value from old tasks to current tasks, facilitating efficient computation of transferability [9][11] - The framework employs a hypernetwork to generate task-specific parameters based on the H-embedding, allowing for automatic adjustment of parameters according to task differences [12][17] Group 4: Experimental Results - The proposed framework has shown superior performance across multiple CL benchmarks, including CIFAR-100, ImageNet-R, and DomainNet, demonstrating its robustness and scalability [18][20] - The model exhibits strong forward and backward transfer capabilities, with minimal interference from new tasks on old tasks, and effectively absorbs knowledge from previous tasks [20] Group 5: Future Directions - The research indicates potential applications of task-structure-aware methods in cross-modal incremental learning, long-term task adaptation for large models, and automated learning sequence planning [21][23] - This approach aims to contribute to the development of more scalable and adaptable general AI systems [21]
LLM 语境下,「持续学习」是否是 「记忆」 问题的最优解?
机器之心· 2025-11-16 01:30
Group 1 - The article discusses the concept of "Nested Learning" proposed by Google, which aims to address the memory management issues in LLMs (Large Language Models) and the challenges of catastrophic forgetting [5][6][8] - Nested Learning is presented as a multi-layered optimization problem, where models are seen as a series of interconnected sub-problems, allowing for the simultaneous learning of new skills while avoiding the loss of previously acquired knowledge [6][7] - The research introduces the "Continuous Memory System" (CMS), which treats memory as a system of multiple modules that update at different frequencies, enhancing the model's ability to manage memory effectively [6][7] Group 2 - The article highlights the importance of improving LLMs' memory capabilities to enable continual learning, allowing AI to retain contextual experiences, semantic knowledge, and procedural skills [8] - A proposed three-layer memory architecture includes Model Weights for general knowledge, KV Cache for intermediate results, and Context for relevant background information, facilitating appropriate responses from the model [8]
突破LLM遗忘瓶颈,谷歌「嵌套学习」让AI像人脑一样持续进化
机器之心· 2025-11-08 06:10
Core Insights - Google has introduced a new machine learning paradigm called Nested Learning, which allows models to continuously learn new skills without forgetting old ones, marking a significant advancement towards AI that evolves like the human brain [1][3][4]. Group 1: Nested Learning Concept - Nested Learning treats machine learning models as a series of interconnected optimization sub-problems, enabling a more efficient learning system [6][11]. - The approach bridges the gap between model architecture and optimization algorithms, suggesting they are fundamentally the same and can be organized into hierarchical optimization systems [7][16]. - This paradigm allows for different components of a model to update at varying frequencies, enhancing the model's ability to manage long-term and short-term memory [15][20]. Group 2: Implementation and Architecture - Google has developed a self-modifying architecture called Hope, based on Nested Learning principles, which outperforms existing models in language modeling and long-context memory management [8][24]. - Hope is an evolution of the Titans architecture, designed to execute infinite levels of contextual learning and optimize its memory through a self-referential process [24][26]. Group 3: Experimental Results - Evaluations show that Hope exhibits lower perplexity and higher accuracy in various language modeling and common-sense reasoning tasks compared to other architectures [27][30]. - The performance of different architectures, including Hope, Titans, and others, was compared in long-context tasks, demonstrating the effectiveness of the Nested Learning framework [30]. Group 4: Future Implications - Nested Learning provides a theoretical and practical foundation for bridging the gap between current LLMs' limitations and the superior continuous learning capabilities of the human brain, paving the way for the development of self-improving AI [30].
大模型微调范式认知再被颠覆?UIUC、Amazon团队最新研究指出SFT灾难性遗忘问题或被误解
机器之心· 2025-10-21 03:43
Core Insights - The article discusses the impact of supervised fine-tuning (SFT) on the general capabilities of large language models (LLMs), suggesting that SFT does not always lead to a significant decline in general performance when a smaller learning rate is used [2][34] - The research challenges the long-held belief that domain-specific fine-tuning inevitably causes catastrophic forgetting of general capabilities, proposing that the choice of training strategy plays a crucial role [2][34] Experiment Details - The study utilized two domain-specific datasets, MedCalc and ESCI, which represent scenarios where open-source LLMs typically perform poorly, making them ideal for domain-specific SFT [5] - Various open-source LLMs were selected for experimentation, including Qwen3-8B and Gemma3-4B, with a focus on controlling the learning rate during SFT [6] Findings - **Finding 1**: Using a smaller learning rate (e.g., 1e-6) allows models to maintain strong performance in target domains while significantly reducing the decline in general capabilities [11] - **Finding 2**: For classification tasks, when the training objective includes only the final label, a wider range of learning rates can achieve ideal performance, as seen in the ESCI dataset [12][14] Theoretical Analysis - The research team concluded that smaller learning rates can effectively limit the decline in general performance, aligning with the experimental findings [17] - The analysis also indicated that when training targets only include final labels, the number of "hard tokens" encountered decreases, allowing for a broader acceptable learning rate range [17] Token Adaptive Loss Reweighting (TALR) - TALR is introduced as a method to dynamically adjust the loss weights of tokens based on their prediction probabilities, effectively reducing the impact of hard tokens during training [20] - The method allows for real-time updates of token weights, ensuring that the model's confidence levels guide the training process [21] Experimental Results - In experiments comparing various strategies to mitigate catastrophic forgetting, TALR demonstrated superior performance, especially under higher learning rates, maintaining domain gains while minimizing losses in general performance [26][27] Conclusion and Future Directions - The research emphasizes the continued importance of SFT in enhancing LLM capabilities, suggesting that while smaller learning rates and TALR are effective, further exploration of more robust strategies is necessary to address the forgetting problem [34][35] - Future research should focus on balancing domain-specific performance with general capabilities, particularly in specialized fields like medicine, where retaining foundational knowledge is crucial [35]
普林斯顿大学最新!VLM2VLA:将 VLM 微调为 VLA,并避免灾难性遗忘
具身智能之心· 2025-10-07 10:00
Core Insights - The article discusses the catastrophic forgetting problem in the context of fine-tuning Visual Language Models (VLMs) into Visual Language Action Models (VLAs) for robotic control, highlighting the mismatch between pre-training and fine-tuning data distributions [2][4]. Group 1: Catastrophic Forgetting - Catastrophic forgetting occurs when the model loses its original reasoning and multimodal understanding capabilities during the action generation training process [2]. - The root cause of this issue is the distribution mismatch between the internet-scale pre-training data (primarily image-text pairs) and the low-dimensional action vector data used for robotic fine-tuning [2]. Group 2: VLM2VLA Approach - VLM2VLA aims to address the distribution mismatch by converting low-dimensional actions into natural language descriptions, aligning the fine-tuning data with the pre-training data [3][4]. - The method employs low-rank adaptation (LoRA) for fine-tuning, minimizing modifications to the VLM backbone and avoiding catastrophic forgetting [4]. Group 3: Hierarchical Action Representation - The VLM2VLA framework decomposes action prediction into a three-level reasoning process, utilizing natural language descriptions at all levels [6]. - High-level subtask prediction generates intermediate tasks based on initial observations and overall task instructions [6]. - Mid-level motion planning produces spatially oriented movement descriptions, while low-level action generation creates executable action sequences with language annotations [6]. Group 4: Data Reconstruction Pipeline - VLM2VLA utilizes Gemini 2.5 to automatically reconstruct raw robotic trajectory datasets into language-annotated datasets, ensuring compatibility with VLM pre-training formats [9]. - The reconstruction process involves providing context, decomposing trajectories into subtasks, and standardizing the format to align with VLM data [9]. Group 5: Efficient Fine-Tuning Strategy - The fine-tuning of the Gemma-3-12B-IT model is conducted using LoRA on linear layers without altering the VLM architecture or requiring joint training with internet-scale data [12][13]. - Key training parameters include a LoRA rank of 16, learning rate of 5e-5, and an effective batch size of 8 [12][13]. Group 6: Experimental Validation - Experiments focus on three core questions comparing VLM2VLA with baseline models, assessing the retention of multimodal understanding, competitive robotic manipulation performance, and the ability to generalize knowledge to new scenarios [14][15]. - VLM2VLA demonstrates competitive performance in both in-distribution and out-of-distribution tasks, showcasing its hierarchical reasoning capabilities [17][19]. Group 7: Limitations and Future Directions - The model currently faces challenges such as reasoning delays and the need for larger-scale robotic language-annotated datasets to enhance generalization capabilities [19]. - Future improvements may include optimizing decoding strategies, expanding language annotation for dexterous actions, and integrating validation capabilities within the VLM itself [19][22].
IEEE TPAMI 2025 | 北京大学提出分布驱动的终身学习范式,用结构建模解决灾难性遗忘
机器之心· 2025-09-26 10:35
Core Viewpoint - The article discusses a recent research achievement in the field of artificial intelligence, specifically focusing on a new framework called DKP++ for lifelong person re-identification (LReID), which addresses the catastrophic forgetting problem in lifelong learning by enhancing memory retention of historical knowledge and improving cross-domain learning capabilities [2][3]. Research Background - Person re-identification (ReID) aims to match and associate images of the same individual across different camera views, locations, and times, with applications in surveillance, intelligent transportation, and urban safety management [3]. - The traditional ReID paradigm struggles with domain shift issues due to variations in data collection conditions, leading to inadequate adaptability in long-term dynamic environments [3]. Research Challenges - The core challenge in lifelong person re-identification is the catastrophic forgetting problem, where the model's retrieval performance for old domain data significantly decreases after learning new domain knowledge [5]. - Existing methods to mitigate forgetting, such as retaining historical samples or using knowledge distillation, face limitations related to data privacy risks, storage overhead, and model flexibility [5]. Research Motivation - The motivation behind DKP++ includes distribution-aware prototype learning to effectively retain historical knowledge without storing historical samples, and cross-domain distribution alignment to enhance the model's ability to learn new knowledge while utilizing historical information [8][10]. Method Design - DKP++ employs a distribution-aware knowledge aligning and prototyping framework, which includes: 1. Instance-level fine-grained modeling to capture local details of person instances [14]. 2. Distribution-aware prototype generation to create robust category-level prototypes that retain intra-class variation knowledge [14]. 3. Distribution alignment to bridge the feature distribution gap between new and old data [14]. 4. Prototype-based knowledge transfer to guide model learning using generated prototypes and labeled new data [14]. Experimental Analysis - The experiments utilized two typical training domain sequences and five widely used ReID datasets, evaluating the model's knowledge retention and generalization capabilities [16]. - DKP++ demonstrated an improvement of 5.2%-7% in average performance on known domains and 4.5%-7.7% in overall generalization performance on unseen domains compared to existing methods [17]. - The model showed higher historical knowledge retention and faster performance growth in unseen domains as the number of learned domains increased [20]. Technical Innovations - DKP++ introduces innovative designs focusing on distribution prototype modeling and representation, as well as sample alignment-guided prototype knowledge transfer to overcome distribution gaps between new and old domain data [23]. Future Outlook - The research suggests potential improvements in areas such as distribution alignment using larger models, active forgetting mechanisms to eliminate redundant knowledge, and multi-modal lifelong learning capabilities to enhance perception in complex environments [23].
机器情感与AI陪伴的人文审度⑥|邱德钧、李玮农:超越记忆——情感计算中遗忘的必要性和实现
Xin Lang Cai Jing· 2025-07-17 02:25
Group 1 - The year 2024 is referred to as the "Year of Humanoid Robots," with predictions that emotional communication between humans and robots will become a norm in future intelligent societies [1] - The concept of machine emotions and AI companionship raises questions about the impact on human-machine interaction and relationships, as well as cultural and gender perspectives on these emotional connections [1] - The discussions highlight the potential social impacts, technological risks, and ethical issues arising from human-robot emotional interactions, prompting interdisciplinary research [1] Group 2 - The concept of machine emotions is defined and analyzed through emotional intelligence, human-machine emotions, and human-machine interaction, advocating for a limited approach to the development of machine emotions [2] - A new perspective on endowing machines with emotional capabilities is proposed based on a life-centered consciousness theory, suggesting that simulating biological homeostasis can lead to autonomous adaptability in machines [2] - Ethical reflections on human-machine emotional interactions, particularly in the context of AI resurrection technology, reveal risks such as emotional dependency and identity crises, necessitating regulatory and cultural adjustments [2] Group 3 - The philosophical discussions in affective computing often rely on idealized technical assumptions, overlooking the importance of forgetting mechanisms in creating realistic and ethical AI emotional systems [3][4] - The current challenges in affective computing include the reliance on data quality and the superficiality of emotional expressions in AI systems, which fail to capture the complexity of human emotional experiences [6] - The introduction of forgetting mechanisms is essential for enhancing the adaptability and authenticity of emotional AI, allowing systems to discard outdated emotional data [11][12] Group 4 - The proposed phenomenology-inspired human-like forgetting neural model (PHFNM) aims to integrate individual and collective forgetting processes in emotional AI systems, reflecting both natural decay and active forgetting [19][22] - The model consists of three interconnected layers: a low-dimensional emotional index layer for natural decay, a memory encoding layer for dynamic reconstruction, and an active forgetting layer for ethical regulation [23][24][25] - The PHFNM framework emphasizes the need for a balance between individual emotional memory and collective social interactions, ensuring that emotional AI systems remain relevant and ethically responsible [26][27]