Workflow
OaK架构
icon
Search documents
一文读懂GPT-5的绝招,这是决定AI未来的隐形武器
3 6 Ke· 2025-09-16 10:43
在GPT-5发布之前,Information曾报道称,GPT-5的性能提升主要来自其研发出的"通用验证器"(Universal Verifier)。 虽然GPT-5后续的能力升级不及预期,但通用验证器却已经成了大模型的下一个"圣杯",近期内成了AI圈内最近最热的话题之一。 为什么它这么关键? 这主要是因为上一波模型能力提升所倚仗的技术是"可验证奖励强化学习"(Reinforcement learning with verifiable rewards, RLVR)。简单说,就是先从 数学、编程这类有标准答案的问题入手:答对加分,答错扣分,训练效果立竿见影。 但现实世界远比"对"与"错"复杂。比如医疗、教育、创意领域,很多问题根本没有唯一解答,一个"好"的答案可能既要专业可靠,又要体现沟通和共情。 RLVR在这些场景下就显得力不从心,甚至让模型在开放性问题上退步。 要让模型进一步进化,就必须突破"对/错"奖励的限制,让AI能像专家一样在不同领域评估优劣,并将海量非结构化经验数据转化为有效的学习信号。通 用验证器正是为此而生,它被认为可能引发强化学习的下一次范式革新。 今天,就用一篇文章了解当下大语言模型界最重要 ...
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].
强化学习之父Richard Sutton最新演讲揭示OaK架构:通向超级智能的八步愿景
机器之心· 2025-08-19 09:45
Core Viewpoint - Richard Sutton, the father of reinforcement learning and 2024 ACM Turing Award winner, presented a vision for achieving general artificial intelligence (AGI) and superintelligence through the OaK architecture, which is based on experiential learning and outlines a clear roadmap for AI development [2][4]. Group 1: OaK Architecture Overview - The OaK architecture is not a complete algorithm but a vision that breaks down the goals for AI development into eight necessary steps, highlighting the current gaps and potential development paths [2][6]. - Sutton emphasizes the importance of a simple and general AI agent architecture that learns from experience rather than relying on pre-defined domain knowledge [10][13]. Group 2: Key Concepts in OaK Architecture - The architecture focuses on "open-ended abstraction," allowing the agent to continuously develop its conceptual framework and understanding of the world without being limited by predefined knowledge [13][28]. - Sutton introduces two critical concepts: design time (before deployment) and runtime (during operation), advocating for learning based on experience during runtime to adapt to the complexities of the world [18][20]. Group 3: Learning and Decision-Making - The architecture proposes that agents should learn solely from runtime experiences, as the complexity of the world cannot be fully anticipated or pre-defined [30][31]. - Sutton argues that the agent's knowledge is inherently approximate due to the vast complexity of the world, necessitating a focus on learning and planning during runtime [37][38]. Group 4: Reinforcement Learning and Reward Hypothesis - The reinforcement learning framework is defined by the goal of maximizing a scalar reward signal, which is central to the agent's learning process [42][47]. - Sutton posits that even a simple reward signal can lead to the emergence of intelligent behavior in a sufficiently complex environment [51]. Group 5: Common Agent Model - The common model of intelligent agents includes components such as perception, value function, reactive policy, and transition model, which are interconnected to facilitate learning and planning [58][61]. - This model serves as a foundation for the OaK architecture, which seeks to enhance it by introducing higher-level abstractions and multiple value functions for different subproblems [67][72]. Group 6: Implementation Steps of OaK Architecture - The implementation of the OaK architecture involves eight parallel steps, including learning strategies for maximizing rewards, generating new state features, and constructing corresponding subproblems [82][85]. - Each step is contingent on the successful realization of continuous deep learning and the ability to generate and evaluate new features [86][90]. Group 7: Future Directions and Challenges - Sutton acknowledges that while some steps in the OaK architecture are feasible, significant challenges remain, particularly in achieving reliable continuous learning in nonlinear deep learning networks [89][96]. - The architecture aims to create a system that evolves through an open-ended cycle of exploration and learning, with the ultimate goal of enhancing the agent's ability to abstract and generalize from experiences [160].