元学习
Search documents
“强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代”
Zheng Quan Shi Bao· 2025-09-11 03:50
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limit, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the signals of observation, action, and reward that are exchanged between agents and the world, asserting that knowledge derives from experience and that the intelligence of an agent depends on its ability to predict and control its input signals [2] - The transition to the experience era is driven by reinforcement learning, but to fully unlock its potential, two currently immature technologies—continual learning and meta-learning—are required [2] Group 2: Collaboration and AI - Addressing concerns about AI leading to bias, unemployment, or even human extinction, Sutton argues that fears surrounding artificial intelligence are exaggerated, and that decentralized collaboration among different agents can lead to mutually beneficial outcomes [2] - He highlights that humanity's greatest strength lies in collaboration, which has been the foundation of economic, market, and governmental successes [2] Group 3: Future of AI - Sutton posits that the replacement of human roles by AI is inevitable, with humans acting as catalysts and pioneers for the "design era," which he categorizes as the fourth era in the evolution of the universe, following the particle, star, and replicator eras [2][3] - He encourages embracing the evolution of artificial intelligence with courage, pride, and a spirit of adventure [3]
强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代
Zheng Quan Shi Bao Wang· 2025-09-11 03:26
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limits, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the interaction of observation, action, and reward, which are signals exchanged between agents and the world [2] - The current machine learning methods are reaching their limits in generating new knowledge, making them unsuitable for continuous learning, which is crucial for intelligence [1][2] Group 2: Technological Advancements - To fully unlock the potential of AI in the experience era, two currently immature technologies are needed: continual learning and meta-learning [2] - Sutton believes that the collaboration between decentralized agents can lead to win-win outcomes, countering fears about AI causing bias, unemployment, or even human extinction [2] Group 3: Human-AI Collaboration - Sutton argues that human collaboration is the greatest success, and AI's role will be to enhance this collaboration, which is fundamental to economic, market, and governmental successes [2] - He posits that AI's replacement of human roles is inevitable, with humans acting as catalysts in ushering in a new "design era" in the evolution of the universe [2] Group 4: Future Perspective - Sutton views artificial intelligence as a necessary next step in the evolution of the universe, advocating for a courageous and adventurous approach to its development [3]
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].
具身智能机器人,如何才能活出个“人样”?
3 6 Ke· 2025-08-04 08:21
Core Insights - The article discusses the evolution and challenges of embodied intelligence, highlighting the distinction between "problem-solving" AI and "practical" AI, with the latter focusing on real-world interactions and learning through sensory experiences [1][3] - It emphasizes the need for embodied intelligence to overcome significant hurdles in understanding, associating, and interacting with the environment, which are essential for robots to function like humans in real-world scenarios [3][5] Group 1: Challenges in Embodied Intelligence - Embodied intelligence must adapt to unstructured real-world environments, requiring advanced computational capabilities to handle dynamic and unpredictable situations [5][6] - The development of higher cognitive strategies that integrate multiple sensory inputs is crucial for robots to understand and interact with their surroundings effectively [6][7] - Robots need to surpass traditional static data processing models to achieve a deeper understanding of dynamic changes and relationships in their environment [6][12] Group 2: Technological Components - The perception layer of embodied intelligence is vital for converting chaotic physical stimuli into understandable digital signals, relying on multimodal sensor fusion and dynamic environment modeling [8][10] - The cognitive layer processes raw data from the perception layer, employing hierarchical decision-making and world model construction to enable robots to learn from experiences [12][14] - The action layer ensures robots can execute tasks safely and effectively, utilizing bio-inspired drive technologies and human-robot collaboration safety designs [16][18] Group 3: Current Limitations and Future Directions - Current embodied intelligence models struggle with task completion rates in non-training scenarios, with a success rate of only 65% for tasks like object grasping [17] - Energy consumption and high costs remain significant barriers to the widespread adoption of humanoid robots, with typical models having a battery life of less than 2 hours and costs exceeding 500,000 yuan [18][19] - Research is focused on optimizing energy efficiency and reducing costs through new battery technologies and domestic production of core components [21][22] Group 4: Future Trends - The integration of multimodal large models is a key future direction, enabling robots to understand natural language commands and adapt quickly to new tasks with minimal samples [23][24] - Lightweight hardware innovations, such as bio-inspired muscle drive technologies, are expected to enhance performance while reducing costs [23][24] - The trend of virtual-physical collaborative evolution will allow robots to train in simulated environments, significantly improving their task execution capabilities in real-world settings [24][25]
刘璐也被Meta挖走了!华南理工校友,创造了4o吉卜力爆款
量子位· 2025-07-15 00:34
Core Viewpoint - Liu Lu, a notable researcher from OpenAI, has joined Meta, which indicates a strategic talent acquisition by Meta to enhance its AI capabilities, particularly in the wake of challenges faced by its Llama 4 release [1][6][34]. Group 1: Liu Lu's Background and Achievements - Liu Lu is a graduate of South China University of Technology and has a strong academic background, including a GPA of 3.84 in her undergraduate studies [3][9]. - She has previously worked at Google, contributing to the development of the Gemini model, and later led the image generation work for GPT-4o at OpenAI, which became widely popular for its "Ghibli style" feature [4][21][23]. - The "Ghibli style" feature generated over 700 million images within the first ten days of its release, showcasing its immense popularity [26]. Group 2: Meta's Talent Acquisition Strategy - Meta has been aggressively recruiting talent from OpenAI, with Liu Lu being one of the key figures, alongside Allan Jabri, who was also part of the GPT-4o core architecture team [5][30]. - This recruitment strategy appears to be part of a broader effort by Meta to build a strong AI team, as evidenced by the growing list of Chinese researchers joining from OpenAI [34][35]. - The current roster of Chinese talent at Meta includes ten individuals, with eight coming from OpenAI, highlighting a focused approach to acquiring top talent in the AI field [35]. Group 3: Implications for the AI Industry - The shift of talent from OpenAI to Meta raises questions about the competitive landscape in the AI industry, particularly regarding the retention of talent at OpenAI [38][39]. - Meta's strategy to recruit from OpenAI may signal a shift in the balance of power within the AI sector, as it seeks to enhance its capabilities and regain trust following previous setbacks [7][34]. - The ongoing recruitment efforts suggest that Meta is not only interested in immediate gains but is also looking to establish a long-term competitive advantage in AI development [34][40].
又一华人科学家被挖走,OpenAI人才加速流失
Hu Xiu· 2025-07-12 10:43
Core Insights - OpenAI is facing significant challenges as Meta and Google aggressively recruit its talent and secure partnerships with key companies in the AI sector [3][10][26]. Group 1: Talent Acquisition and Competition - Meta has successfully recruited two researchers from OpenAI, Allan Jabri and Lu Liu, to bolster its AI capabilities [3][12][24]. - Lu Liu, a prominent figure in the 4o image generation team at OpenAI, has a strong academic background in deep learning and has previously worked at major tech companies [15][20][24]. - Meta's recruitment strategy has reportedly involved offering substantial compensation packages, with some reports suggesting a total of $300 million for multiple hires [24][25]. Group 2: Strategic Partnerships and Acquisitions - OpenAI's potential acquisition of the AI programming company Windsurf fell through, with Google announcing a partnership with Windsurf instead [5][27][29]. - Google has invested $2.4 billion to integrate Windsurf's technology and talent into its DeepMind division, which is seen as a strategic move to enhance its AI capabilities [9][32]. - The failed acquisition was reportedly influenced by Microsoft's objections, as OpenAI's contract with Microsoft includes clauses that limit its ability to acquire certain technologies [36][39]. Group 3: Financial and Structural Challenges - OpenAI is undergoing a difficult transition from a non-profit to a public benefit corporation (PBC), facing hurdles due to its contractual obligations with Microsoft [38][40]. - The company has committed to a significant equity incentive plan for 2024, amounting to $4.4 billion, which exceeds its projected revenue, indicating financial strain [56][57]. - OpenAI's CEO has expressed dissatisfaction with Meta's aggressive recruitment tactics, likening it to a form of theft [47].
新西兰奥克兰大学段雨晴:以脏数据为突破口,Z世代重塑AI创新范式
Huan Qiu Wang Zi Xun· 2025-07-06 06:52
Group 1 - The core theme of the event is how big data analysis drives AI optimization and innovation, emphasizing the importance of interpreting complexity rather than just the quantity of data [2] - Z generation possesses a unique ability to extract valuable signals from noise in a data-saturated environment, which is crucial for both humans and AI systems [3] - Retaining "dirty data" can sometimes be more valuable than overly cleaned data, especially in fields like fraud detection where atypical behaviors are key indicators [3] Group 2 - Cross-domain data fusion is essential for AI optimization, allowing for a more comprehensive understanding of market dynamics beyond traditional financial metrics [4] - The shift from "big data samples" to "small data samples" highlights the importance of "meta-learning," enabling AI to adapt quickly with fewer data points [4][5] - This transition offers advantages such as enhanced privacy protection and improved response speed, allowing AI to remain efficient in rapidly changing environments [6]
LLM已能自我更新权重,自适应、知识整合能力大幅提升,AI醒了?
机器之心· 2025-06-14 04:12
Core Insights - The article discusses the increasing research and discussions around AI self-evolution, highlighting various frameworks and models that aim to enable AI systems to improve themselves autonomously [1][2]. Group 1: AI Self-Evolution Frameworks - Several notable frameworks for AI self-improvement are mentioned, including "Darwin-Gödel Machine" (DGM), "Self-Reinforcement Training" (SRT), "MM-UPT" for multimodal large models, and "UI-Genie" for self-improvement [1]. - OpenAI's CEO Sam Altman envisions a future where humanoid robots can autonomously manufacture more robots and essential infrastructure, indicating a significant leap in AI capabilities [1]. - A recent MIT paper titled "Self-Adapting Language Models" introduces SEAL (Self-Adapting LLMs), which allows language models to update their weights based on generated training data [2][4]. Group 2: SEAL Methodology - SEAL employs a self-editing mechanism through reinforcement learning, where the model generates its own training data and updates its weights based on performance improvements [10][12]. - The SEAL framework consists of two nested loops: an external reinforcement learning loop for optimizing self-editing generation and an internal update loop for adjusting model parameters [13][15]. - The model's training involves generating self-edits and using supervised fine-tuning to update its parameters, enhancing its adaptability to new tasks [18][19]. Group 3: Experimental Results - In few-shot learning experiments, SEAL achieved a success rate of 72.5%, significantly outperforming baseline methods, which had success rates of 0% and 20% [34][36]. - For knowledge integration tasks, SEAL demonstrated improved accuracy, achieving 47.0% in single passage scenarios and 43.8% in continued pretraining, surpassing other training methods [38][40]. - The results indicate that SEAL's reinforcement learning approach leads to more effective self-edits, enhancing overall model performance [43].
LSTM之父22年前构想将成真?一周内AI「自我进化」论文集中发布,新趋势涌现?
机器之心· 2025-06-02 05:22
Core Insights - The article discusses the evolution of AI systems towards self-improvement, highlighting recent advancements in self-learning models, particularly the "Darwin Gödel Machine" (DGM) and other frameworks [1][4][6]. Group 1: Darwin Gödel Machine (DGM) - DGM utilizes foundational models and open-ended algorithms to create and evaluate new AI agents, capable of reading and modifying its own Python code for self-improvement [4][6]. - DGM has demonstrated significant self-improvement capabilities, with performance metrics increasing from 20.0% to 50.0% on the sw-bench and from 14.2% to 30.7% on Polyglot, surpassing manually designed agents [10]. - The system operates by alternating self-modification and downstream task evaluation, continuously generating and scoring new agents [10][8]. Group 2: Self-Rewarded Training (SRT) - SRT is an online self-training reinforcement learning algorithm that allows large language models to self-supervise and train without external labels, enhancing performance through self-generated feedback [14][16]. - Initial experiments show that SRT can achieve performance comparable to standard reinforcement learning methods that rely on gold-standard answers, although it may eventually face performance degradation [18][21]. - Strategies to mitigate reward hacking include early stopping, using offline-generated labels for self-training, and implementing curriculum learning to maintain model performance [22][24][26]. Group 3: Multi-Modal Unsupervised Post-Training (MM-UPT) - MM-UPT is a framework for continuous self-improvement of multi-modal large models in completely unsupervised settings, validated across multiple benchmarks [30][32]. - The framework employs a voting mechanism to generate pseudo-labels from self-generated data, allowing models to enhance their reasoning capabilities without external supervision [39][40]. - Experiments indicate that MM-UPT can improve accuracy from 66.3% to 72.9% on the MathVista benchmark, demonstrating its effectiveness compared to previous unsupervised methods [39][40]. Group 4: UI-Genie Framework - UI-Genie is designed to address challenges in GUI agents, focusing on trajectory validation and the acquisition of high-quality training data [45][47]. - The framework includes a reward model that efficiently processes historical context and unifies action-level and task-level rewards, enhancing the agent's learning capabilities [45][50]. - Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks after iterative self-improvement cycles [52].