OaK架构
Search documents
一文读懂GPT-5的绝招,这是决定AI未来的隐形武器
3 6 Ke· 2025-09-16 10:43
Core Insights - The article discusses the significance of the "Universal Verifier" in the evolution of AI models, particularly in the context of GPT-5 and its performance enhancements [2][3] - It highlights the limitations of previous reinforcement learning methods, particularly "Reinforcement Learning with Verifiable Rewards" (RLVR), in complex real-world scenarios where answers are not binary [2][4] - The article outlines two main approaches to developing the Universal Verifier: enhancing the evaluation criteria and allowing models to self-assess their outputs [36][44] Group 1: Universal Verifier and Its Importance - The Universal Verifier is seen as a potential breakthrough in AI, addressing the shortcomings of RLVR by enabling models to evaluate answers in a more nuanced manner [2][10] - The need for a more sophisticated evaluation system arises from the complexity of real-world problems, especially in fields like healthcare and education, where answers are not simply right or wrong [2][11] - The article emphasizes that understanding the Universal Verifier is crucial for grasping the future of AI technology and competition [3] Group 2: Approaches to Developing the Universal Verifier - The first approach involves using large language models (LLMs) as judges to create a more complex evaluation standard, which has been explored in various research papers [4][5][6] - The second approach focuses on self-assessment, where models evaluate their own outputs based on internal confidence levels, reducing reliance on external validation [44][45] - The RaR (Rubrics as Rewards) framework is introduced as a method to create detailed scoring criteria for evaluating model outputs, leading to significant performance improvements in specific domains [19][21][22] Group 3: Performance Improvements and Results - The article presents data showing that models trained using the RaR framework achieved substantial performance gains, with scores in medical evaluations increasing nearly fourfold [21][22] - Comparisons with other evaluation methods indicate that RaR outperformed traditional approaches, demonstrating its effectiveness in complex reasoning tasks [22][24] - The Rubicon framework further enhances the scoring system by incorporating over 10,000 evaluation criteria, leading to improved performance in subjective areas like creative writing [27][28] Group 4: Future Directions and Challenges - The article discusses the limitations of current approaches, noting that while RaR and Rubicon show promise, they still rely on expert-defined criteria, which may hinder scalability [69][70] - The INTUITOR method represents a shift towards internal feedback mechanisms, allowing models to learn without predefined answers, but it also faces challenges in generalizability [59][60] - The OaK architecture is proposed as a long-term vision for AI, aiming for a system that learns and evolves through interaction with the environment, though it remains a distant goal [70][77]
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].
强化学习之父Richard Sutton最新演讲揭示OaK架构:通向超级智能的八步愿景
机器之心· 2025-08-19 09:45
Core Viewpoint - Richard Sutton, the father of reinforcement learning and 2024 ACM Turing Award winner, presented a vision for achieving general artificial intelligence (AGI) and superintelligence through the OaK architecture, which is based on experiential learning and outlines a clear roadmap for AI development [2][4]. Group 1: OaK Architecture Overview - The OaK architecture is not a complete algorithm but a vision that breaks down the goals for AI development into eight necessary steps, highlighting the current gaps and potential development paths [2][6]. - Sutton emphasizes the importance of a simple and general AI agent architecture that learns from experience rather than relying on pre-defined domain knowledge [10][13]. Group 2: Key Concepts in OaK Architecture - The architecture focuses on "open-ended abstraction," allowing the agent to continuously develop its conceptual framework and understanding of the world without being limited by predefined knowledge [13][28]. - Sutton introduces two critical concepts: design time (before deployment) and runtime (during operation), advocating for learning based on experience during runtime to adapt to the complexities of the world [18][20]. Group 3: Learning and Decision-Making - The architecture proposes that agents should learn solely from runtime experiences, as the complexity of the world cannot be fully anticipated or pre-defined [30][31]. - Sutton argues that the agent's knowledge is inherently approximate due to the vast complexity of the world, necessitating a focus on learning and planning during runtime [37][38]. Group 4: Reinforcement Learning and Reward Hypothesis - The reinforcement learning framework is defined by the goal of maximizing a scalar reward signal, which is central to the agent's learning process [42][47]. - Sutton posits that even a simple reward signal can lead to the emergence of intelligent behavior in a sufficiently complex environment [51]. Group 5: Common Agent Model - The common model of intelligent agents includes components such as perception, value function, reactive policy, and transition model, which are interconnected to facilitate learning and planning [58][61]. - This model serves as a foundation for the OaK architecture, which seeks to enhance it by introducing higher-level abstractions and multiple value functions for different subproblems [67][72]. Group 6: Implementation Steps of OaK Architecture - The implementation of the OaK architecture involves eight parallel steps, including learning strategies for maximizing rewards, generating new state features, and constructing corresponding subproblems [82][85]. - Each step is contingent on the successful realization of continuous deep learning and the ability to generate and evaluate new features [86][90]. Group 7: Future Directions and Challenges - Sutton acknowledges that while some steps in the OaK architecture are feasible, significant challenges remain, particularly in achieving reliable continuous learning in nonlinear deep learning networks [89][96]. - The architecture aims to create a system that evolves through an open-ended cycle of exploration and learning, with the ultimate goal of enhancing the agent's ability to abstract and generalize from experiences [160].