锦秋被投企业Pokee AI 创始人朱哲清：一个强化学习信仰者的十年｜Jinqiu Spotlight

Core Insights - The article discusses the journey of Zhu Zheqing, founder of Pokee AI, who focuses on reinforcement learning (RL) as a path to develop intelligent agents capable of learning in uncertain environments. The narrative emphasizes the challenges and skepticism faced in pursuing this less popular but potentially rewarding approach in the AI landscape dominated by large models [6][12][36]. Group 1: Company Overview - Pokee AI completed a $12 million seed round of financing in July 2025, gaining traction in various industries and technologies [6][14]. - Zhu Zheqing, a former leader in Meta's AI reinforcement learning team, founded Pokee AI with the vision of creating agents that can learn actively through exploration and feedback [8][12]. Group 2: Reinforcement Learning Focus - The article highlights the return of reinforcement learning as a significant technical route, contrasting it with the prevailing focus on large pre-trained models [5][9]. - Zhu Zheqing's approach to reinforcement learning emphasizes the need for complex environments that allow agents to fail and learn without real-world consequences, addressing the limitations of traditional methods [10][18]. Group 3: Industry Challenges and Perspectives - The skepticism surrounding reinforcement learning is noted, particularly during a time when scaling laws dominate the AI discourse, leading many investors to question the viability of RL-based approaches [12][25]. - The emergence of InstructGPT in 2022 provided a new paradigm for reinforcement learning, creating a more realistic environment for training agents through human feedback [11][22]. Group 4: Technological Innovations - Zhu Zheqing advocates for an integrated model approach, challenging the prevalent retrieval-augmented generation (RAG) paradigm, which he believes leads to information loss and inefficiencies [26][30]. - The article discusses the limitations of existing tools and APIs in the AI ecosystem, emphasizing the need for AI-native tools that better align with the requirements of intelligent agents [29][30]. Group 5: Future Vision - Zhu Zheqing envisions a future where agents can autonomously explore optimal tool combinations without relying on user input, representing a significant shift in how AI interacts with technology [29][30]. - The article concludes with Zhu's commitment to reinforcement learning as a pathway to achieving artificial general intelligence (AGI), reflecting a deep-seated belief in the potential of this approach [36].