参数化记忆
Search documents
北大团队提出 SHINE:将任意文本转化为大模型 LoRA,仅需一次前向传播!
机器之心· 2026-03-23 07:10
Core Insights - The article introduces a novel hypernetwork architecture called SHINE, which can convert any text into LoRA parameters with a single forward pass, allowing for multi-turn dialogue based on the text [2][3][43] - SHINE addresses key challenges in constructing hypernetworks for large language models (LLMs), achieving a balance between parameter scale and expressive capability [9][23] - The method demonstrates significant practical potential and scalability, providing a new technical pathway for knowledge injection and rapid adaptation in large models [8][10][43] Background and Methodology - Hypernetworks are specialized neural networks that output parameters for another neural network. SHINE trains a hypernetwork to generate LoRA parameters directly from any text input [3][5] - The architecture consists of two main components: an LLM and an M2P Transformer, which together enhance the hypernetwork's ability to generate parameters without adding extra parameters [19][20] - The training process follows a "pre-training - instruction fine-tuning" paradigm, utilizing large-scale training data to improve model performance continuously [10][25] Performance and Efficiency - SHINE achieves high-quality LoRA generation with minimal time and token overhead, outperforming traditional methods like Supervised Fine-Tuning (SFT) and In-Context Learning (ICL) in efficiency [11][36][39] - Experimental results show that SHINE closely approaches the performance of the In-Context method while significantly reducing inference time and computational costs [36][37] - In comparison to Test-Time Training (TTT), SHINE provides superior performance with negligible latency and no additional training overhead [38][39] Scalability and Future Prospects - The architecture exhibits strong scalability, with performance improvements observed as the size of the base model, LoRA dimensions, and M2P Transformer layers increase [41] - The article emphasizes the growing importance of hypernetwork-based methods for generating parameters in LLMs, with potential applications expanding into various domains [44] - Future research directions include enhancing the handling of long texts, integrating reasoning mechanisms, and optimizing model architecture for better performance [44]
那天,AI大模型想起了,被「失忆」所束缚的枷锁
机器之心· 2025-08-31 05:33
Core Insights - The article discusses the advancements in memory capabilities of large language models (LLMs), highlighting how companies like Google, OpenAI, and Anthropic are integrating memory features into their AI systems to enhance user interaction and continuity in conversations [1][3][10]. Memory Capabilities of LLMs - Google's Gemini has introduced memory capabilities that allow it to retain information across multiple conversations, making interactions more natural and coherent [1]. - OpenAI's ChatGPT has implemented a memory feature since February 2024, enabling users to instruct the model to remember specific details, which improves its performance over time [3][42]. - Anthropic's Claude has also added memory functionality, allowing it to recall previous discussions when prompted by the user [3][6]. Types of Memory in LLMs - Memory can be categorized into sensory memory, short-term memory, and long-term memory, with a focus on long-term memory for LLMs [16][17]. - Contextual memory is a form of short-term memory where relevant information is included in the model's context window [18]. - External memory involves storing information in an external database, allowing for retrieval during interactions, which is a common method for building long-term memory [22][23]. - Parameterized memory attempts to encode information directly into the model's parameters, providing a deeper form of memory [24][29]. Innovations in Memory Systems - New startups are emerging, focusing on memory systems for AI, such as Letta AI's MemGPT and RockAI's Yan 2.0 Preview, which aim to enhance memory capabilities [11][12]. - The concept of hybrid memory systems is gaining traction, combining different types of memory to improve AI's adaptability and performance [37][38]. Notable Memory Implementations - OpenAI's ChatGPT allows users to manage their memory entries, while Anthropic's Claude retrieves past conversations only when requested [42][44]. - Gemini supports user input for memory management, enhancing its ability to remember user preferences [45]. - The M3-Agent developed by ByteDance, Zhejiang University, and Shanghai Jiao Tong University integrates long-term memory capabilities across multiple modalities, including video and audio [10][70]. Future Trends in AI Memory - The future of AI memory is expected to evolve towards multi-modal and integrated memory systems, allowing for a more comprehensive understanding of user interactions [97][106]. - There is a growing emphasis on creating memory systems that can autonomously manage and optimize their memory, akin to human cognitive processes [101][106]. - The ultimate goal is to develop AI systems that can exhibit unique personalities and emotional connections through their memory capabilities, potentially leading to the emergence of artificial general intelligence (AGI) [109][110].