系统提示学习 - filings, earnings calls, financial reports, news

系统提示学习

Search documents

3 6 Ke· 2025-10-27 05:13

Core Insights - The article discusses the recent advancements in large language models (LLMs) regarding their ability to achieve continual learning and self-evolution, addressing criticisms about their lack of genuine learning capabilities [1][2]. Group 1: Paths to Continual Learning - The ability of LLMs to learn continuously is fundamentally linked to their memory depth and plasticity, with three main paths identified for enhancing this capability [2]. - The first path involves modifying the "context" or "working memory" of the model through In-Context Learning (ICL), where new information is provided in prompts to help the model learn to solve specific problems [4][6]. - The second path introduces an "external memory bank" (RAG), allowing models to access and maintain an external database for comparison and retrieval, exemplified by Google's DeepMind's "Reasoningbank" [7]. - The third path focuses on parameter-level continual learning, which has faced challenges due to the complexities and instabilities associated with methods like Reinforcement Learning (RL) and Low-Rank Adaptation (LoRA) [10][11]. Group 2: Sparse Memory Fine-Tuning - Meta AI's recent paper introduces Sparse Memory Fine-Tuning (SFT) as a solution to the challenges of traditional SFT, particularly addressing the issue of catastrophic forgetting [11][28]. - The proposed method involves a three-step process: modifying the architecture to include a memory layer, using TF-IDF to identify which parameters to update, and performing sparse updates to only the most relevant parameters [12][22][23]. - This new approach has shown significant improvements, with models experiencing only an 11% drop in performance on original tasks after learning new facts, compared to 71% and 89% drops with LoRA and full fine-tuning, respectively [23][25]. Group 3: Implications for the Future of LLMs - The advancements in SFT suggest a potential shift in how models can be updated safely and effectively, moving away from static tools to dynamic agents capable of continuous learning [31][32]. - The successful implementation of these methods could mark the beginning of a new era for self-evolving models, aligning with the vision of models that grow and adapt through experience [31][32].

Meta Platforms(US:META)

Artificial Intelligence

Artificial Intelligence

YC AI 创业营第一天，Andrej Karpathy 的演讲刷屏了

Founder Park· 2025-06-18 14:28

Group 1 - The article emphasizes that we are in the decade of intelligent agents, not just the year of intelligent agents, highlighting the evolution of software development skills required in the era of large language models (LLMs) [1][4] - The concept of Software 3.0 is introduced, where prompt engineering is seen as the new programming paradigm, replacing traditional coding and neural networks [2][8] - LLMs are described as a combination of high intelligence and cognitive deficiencies, likened to a human-like system with significant capabilities but unpredictable limitations [7][15] Group 2 - The article discusses the importance of "memory capability" in LLMs, which should focus on general problem-solving knowledge rather than storing random facts about users [7][50] - The "Autonomy Slider" concept is introduced, allowing users to adjust the level of autonomy in AI applications based on specific contexts [7][60] - The evolution of software is outlined as transitioning from Software 1.0 (code programming) to Software 2.0 (neural networks) and now to Software 3.0 (prompt engineering), indicating a coexisting state of all three [13][10] Group 3 - LLMs are compared to public infrastructure, wafer fabs, and operating systems, emphasizing their role in providing intelligent services and the need for stable operational characteristics [20][26][32] - The article highlights the dual nature of LLMs, showcasing their ability to perform complex tasks while also exhibiting failures in simpler tasks, a phenomenon termed "jagged intelligence" [49][50] - The need for a new learning paradigm for LLMs is proposed, focusing on system prompt learning rather than traditional reinforcement learning [54][56] Group 4 - The article discusses the gap between prototype demonstrations and reliable products, emphasizing the need for partial autonomy in AI systems to bridge this gap [73][74] - Insights from various industry leaders are shared, including the importance of practical action, long-term vision, and the evolving landscape of AI applications [94][95][96] - The article concludes with a call for more focus on building AI products that enhance human capabilities rather than merely automating tasks [141][142]

AI也需要"记笔记"：Karpathy从Claude 1.6万字提示词中看到的未来

歸藏的AI工具箱· 2025-05-12 08:28

Core Viewpoint - The article discusses the significance of system prompts in large language models (LLMs), particularly focusing on Claude's extensive system prompt and the potential for a new learning paradigm termed "system prompt learning" proposed by Karpathy [6][12]. Group 1: System Prompts Overview - Claude's system prompt consists of 16,739 words, significantly longer than OpenAI's ChatGPT o4-mini, which has only 2,218 words, representing just 13% of Claude's prompt [2][3]. - System prompts serve as an initial instruction manual for LLMs, guiding their roles, rules, and response styles [4]. - The content of Claude's system prompt includes tool definitions, user preferences, and guidelines for various tasks, indicating a structured approach to AI interactions [8]. Group 2: Current Learning Paradigms - The existing learning paradigms for LLMs include pretraining, which provides broad knowledge through large datasets, and finetuning, which adjusts model behavior through parameter updates [9]. - Unlike LLMs, humans often learn by summarizing experiences and strategies, akin to "note-taking," rather than solely relying on parameter updates [10]. Group 3: System Prompt Learning - Karpathy suggests that LLMs should adopt a "system prompt learning" mechanism, allowing them to store strategies and knowledge in an explicit format, enhancing efficiency and scalability [10][12]. - This new learning paradigm could lead to more effective data utilization and improved generalization capabilities for LLMs [19]. Group 4: Practical Implications - Clear and detailed instructions in system prompts lead to more accurate AI responses, emphasizing the importance of structured communication [13][14]. - The article highlights that "prompt engineering" is an extension of everyday communication skills, making it accessible for ordinary users [16].

Artificial Intelligence

Artificial Intelligence

Claude