元学习

Search documents
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
作者 | 理查德·萨顿(Richard Sutton) 2)每一个学习到的权重,都配有一个专门的步长参数,该参数通过在线交叉验证进行元学习; 原标题 | OaK 架构:一个源于经验的超级智能构想 来源 | RLC 2025 会议文章 ( youtu.be/gEbbGyNkR2U ) 编译 | 王启隆 出品丨AI 科技大本营(ID:rgznai100) 随着人工智能发展成为一个庞大的产业,它在很大程度上已经迷失了方向。 我们需要什么才能重回正轨,去探寻真正的智能? 我们需要能够持续学习的智能体、世界模型和规划能力,以及学习高层次知识和通过元学习掌握泛化的能力。 OaK 架构 正是对所有这些需求的一个系统性回应。从整体上看,它是一个基于模型的强化学习架构,并具备三个鲜明特点: 1)其所有组件都能持续学习; 3)状态和时间上的抽象概念,通过一个我们称之为 FC-STOMP 的五步演进路径被持续创造出来,即:特征构建( F eature C onstruction)、 基于特征提出子任务(posing a S ub T ask)、学习一个选项来解决该子任务(learning an O ption)、学习该选项的模型( ...
刘璐也被Meta挖走了!华南理工校友,创造了4o吉卜力爆款
量子位· 2025-07-15 00:34
Core Viewpoint - Liu Lu, a notable researcher from OpenAI, has joined Meta, which indicates a strategic talent acquisition by Meta to enhance its AI capabilities, particularly in the wake of challenges faced by its Llama 4 release [1][6][34]. Group 1: Liu Lu's Background and Achievements - Liu Lu is a graduate of South China University of Technology and has a strong academic background, including a GPA of 3.84 in her undergraduate studies [3][9]. - She has previously worked at Google, contributing to the development of the Gemini model, and later led the image generation work for GPT-4o at OpenAI, which became widely popular for its "Ghibli style" feature [4][21][23]. - The "Ghibli style" feature generated over 700 million images within the first ten days of its release, showcasing its immense popularity [26]. Group 2: Meta's Talent Acquisition Strategy - Meta has been aggressively recruiting talent from OpenAI, with Liu Lu being one of the key figures, alongside Allan Jabri, who was also part of the GPT-4o core architecture team [5][30]. - This recruitment strategy appears to be part of a broader effort by Meta to build a strong AI team, as evidenced by the growing list of Chinese researchers joining from OpenAI [34][35]. - The current roster of Chinese talent at Meta includes ten individuals, with eight coming from OpenAI, highlighting a focused approach to acquiring top talent in the AI field [35]. Group 3: Implications for the AI Industry - The shift of talent from OpenAI to Meta raises questions about the competitive landscape in the AI industry, particularly regarding the retention of talent at OpenAI [38][39]. - Meta's strategy to recruit from OpenAI may signal a shift in the balance of power within the AI sector, as it seeks to enhance its capabilities and regain trust following previous setbacks [7][34]. - The ongoing recruitment efforts suggest that Meta is not only interested in immediate gains but is also looking to establish a long-term competitive advantage in AI development [34][40].
又一华人科学家被挖走,OpenAI人才加速流失
Hu Xiu· 2025-07-12 10:43
Core Insights - OpenAI is facing significant challenges as Meta and Google aggressively recruit its talent and secure partnerships with key companies in the AI sector [3][10][26]. Group 1: Talent Acquisition and Competition - Meta has successfully recruited two researchers from OpenAI, Allan Jabri and Lu Liu, to bolster its AI capabilities [3][12][24]. - Lu Liu, a prominent figure in the 4o image generation team at OpenAI, has a strong academic background in deep learning and has previously worked at major tech companies [15][20][24]. - Meta's recruitment strategy has reportedly involved offering substantial compensation packages, with some reports suggesting a total of $300 million for multiple hires [24][25]. Group 2: Strategic Partnerships and Acquisitions - OpenAI's potential acquisition of the AI programming company Windsurf fell through, with Google announcing a partnership with Windsurf instead [5][27][29]. - Google has invested $2.4 billion to integrate Windsurf's technology and talent into its DeepMind division, which is seen as a strategic move to enhance its AI capabilities [9][32]. - The failed acquisition was reportedly influenced by Microsoft's objections, as OpenAI's contract with Microsoft includes clauses that limit its ability to acquire certain technologies [36][39]. Group 3: Financial and Structural Challenges - OpenAI is undergoing a difficult transition from a non-profit to a public benefit corporation (PBC), facing hurdles due to its contractual obligations with Microsoft [38][40]. - The company has committed to a significant equity incentive plan for 2024, amounting to $4.4 billion, which exceeds its projected revenue, indicating financial strain [56][57]. - OpenAI's CEO has expressed dissatisfaction with Meta's aggressive recruitment tactics, likening it to a form of theft [47].
新西兰奥克兰大学段雨晴:以脏数据为突破口,Z世代重塑AI创新范式
Huan Qiu Wang Zi Xun· 2025-07-06 06:52
Group 1 - The core theme of the event is how big data analysis drives AI optimization and innovation, emphasizing the importance of interpreting complexity rather than just the quantity of data [2] - Z generation possesses a unique ability to extract valuable signals from noise in a data-saturated environment, which is crucial for both humans and AI systems [3] - Retaining "dirty data" can sometimes be more valuable than overly cleaned data, especially in fields like fraud detection where atypical behaviors are key indicators [3] Group 2 - Cross-domain data fusion is essential for AI optimization, allowing for a more comprehensive understanding of market dynamics beyond traditional financial metrics [4] - The shift from "big data samples" to "small data samples" highlights the importance of "meta-learning," enabling AI to adapt quickly with fewer data points [4][5] - This transition offers advantages such as enhanced privacy protection and improved response speed, allowing AI to remain efficient in rapidly changing environments [6]
LLM已能自我更新权重,自适应、知识整合能力大幅提升,AI醒了?
机器之心· 2025-06-14 04:12
Core Insights - The article discusses the increasing research and discussions around AI self-evolution, highlighting various frameworks and models that aim to enable AI systems to improve themselves autonomously [1][2]. Group 1: AI Self-Evolution Frameworks - Several notable frameworks for AI self-improvement are mentioned, including "Darwin-Gödel Machine" (DGM), "Self-Reinforcement Training" (SRT), "MM-UPT" for multimodal large models, and "UI-Genie" for self-improvement [1]. - OpenAI's CEO Sam Altman envisions a future where humanoid robots can autonomously manufacture more robots and essential infrastructure, indicating a significant leap in AI capabilities [1]. - A recent MIT paper titled "Self-Adapting Language Models" introduces SEAL (Self-Adapting LLMs), which allows language models to update their weights based on generated training data [2][4]. Group 2: SEAL Methodology - SEAL employs a self-editing mechanism through reinforcement learning, where the model generates its own training data and updates its weights based on performance improvements [10][12]. - The SEAL framework consists of two nested loops: an external reinforcement learning loop for optimizing self-editing generation and an internal update loop for adjusting model parameters [13][15]. - The model's training involves generating self-edits and using supervised fine-tuning to update its parameters, enhancing its adaptability to new tasks [18][19]. Group 3: Experimental Results - In few-shot learning experiments, SEAL achieved a success rate of 72.5%, significantly outperforming baseline methods, which had success rates of 0% and 20% [34][36]. - For knowledge integration tasks, SEAL demonstrated improved accuracy, achieving 47.0% in single passage scenarios and 43.8% in continued pretraining, surpassing other training methods [38][40]. - The results indicate that SEAL's reinforcement learning approach leads to more effective self-edits, enhancing overall model performance [43].
LSTM之父22年前构想将成真?一周内AI「自我进化」论文集中发布,新趋势涌现?
机器之心· 2025-06-02 05:22
Core Insights - The article discusses the evolution of AI systems towards self-improvement, highlighting recent advancements in self-learning models, particularly the "Darwin Gödel Machine" (DGM) and other frameworks [1][4][6]. Group 1: Darwin Gödel Machine (DGM) - DGM utilizes foundational models and open-ended algorithms to create and evaluate new AI agents, capable of reading and modifying its own Python code for self-improvement [4][6]. - DGM has demonstrated significant self-improvement capabilities, with performance metrics increasing from 20.0% to 50.0% on the sw-bench and from 14.2% to 30.7% on Polyglot, surpassing manually designed agents [10]. - The system operates by alternating self-modification and downstream task evaluation, continuously generating and scoring new agents [10][8]. Group 2: Self-Rewarded Training (SRT) - SRT is an online self-training reinforcement learning algorithm that allows large language models to self-supervise and train without external labels, enhancing performance through self-generated feedback [14][16]. - Initial experiments show that SRT can achieve performance comparable to standard reinforcement learning methods that rely on gold-standard answers, although it may eventually face performance degradation [18][21]. - Strategies to mitigate reward hacking include early stopping, using offline-generated labels for self-training, and implementing curriculum learning to maintain model performance [22][24][26]. Group 3: Multi-Modal Unsupervised Post-Training (MM-UPT) - MM-UPT is a framework for continuous self-improvement of multi-modal large models in completely unsupervised settings, validated across multiple benchmarks [30][32]. - The framework employs a voting mechanism to generate pseudo-labels from self-generated data, allowing models to enhance their reasoning capabilities without external supervision [39][40]. - Experiments indicate that MM-UPT can improve accuracy from 66.3% to 72.9% on the MathVista benchmark, demonstrating its effectiveness compared to previous unsupervised methods [39][40]. Group 4: UI-Genie Framework - UI-Genie is designed to address challenges in GUI agents, focusing on trajectory validation and the acquisition of high-quality training data [45][47]. - The framework includes a reward model that efficiently processes historical context and unifies action-level and task-level rewards, enhancing the agent's learning capabilities [45][50]. - Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks after iterative self-improvement cycles [52].