Deep research
Search documents
27岁掌舵腾讯大模型,非典型天才定义AI下半场
Sou Hu Cai Jing· 2025-12-23 17:06
Core Insights - Yao Shunyu, a prominent figure in AI, has made significant contributions to the development of intelligent agents and large language models, showcasing a trajectory from academic excellence to industry leadership [1][11]. Group 1: Academic Background and Early Career - Yao Shunyu entered Tsinghua University with a strong academic record and later pursued advanced studies at Princeton University, focusing on natural language processing and reinforcement learning [1][3]. - He was recognized as a young innovator, being included in MIT Technology Review's list of 35 Innovators Under 35 in China [3]. Group 2: Research Focus and Contributions - Yao's research primarily revolves around intelligent agents, which are systems capable of self-decision-making and interaction with their environment [7]. - He shifted his focus from computer vision to language processing, believing that language holds greater potential for achieving general intelligence [4][5]. - Yao's work on the ReAct method, which combines reasoning and action, has become a mainstream approach in building language agents, enhancing their controllability and applicability across various fields [9][10]. Group 3: Industry Impact and Future Directions - In 2024, Yao joined OpenAI, where he played a key role in developing the company's first intelligent agent products and participated in deep research projects [10][11]. - His upcoming role at Tencent as Chief AI Scientist will involve leading the AI Infra department, focusing on large model training and inference capabilities, aligning with Tencent's strategic emphasis on AI [11][12]. - Yao believes that the next phase of AI will prioritize defining problems over merely solving them, indicating a shift in focus towards creating practical applications of AI technology [12][13].
一堂「强化学习」大师课 | 42章经
42章经· 2025-04-13 12:01
曲凯: 今天我们请来了国内强化学习 (RL) 领域的专家吴翼,吴翼目前是清华大学交叉信息研究院助理教授,他曾经在 OpenAI 工作过,算是国内最早研究强化学 习的人之一,我们今天就争取一起把 RL 这个话题给大家聊透。 首先吴翼能不能简单解释一下,到底什么是 RL? 因此,RL 其实更通用一些,它的逻辑和我们在真实生活中解决问题的逻辑非常接近。比如我要去美国出差,只要最后能顺利往返,中间怎么去机场、选什么航 司、具体坐哪个航班都是开放的。 但 RL 很不一样。 RL 最早是用来打游戏的,而游戏的特点和分类问题有两大区别。 第一,游戏过程中有非常多的动作和决策。比如我们玩一个打乒乓球的游戏,发球、接球、回球,每一个动作都是非标的,而且不同的选择会直接影响最终的结 果。 第二,赢得一场游戏的方式可能有上万种,并没有唯一的标准答案。 所以 RL 是一套用于解决多步决策问题的算法框架。它要解决的问题没有标准答案,每一步的具体决策也不受约束,但当完成所有决策后,会有一个反馈机制来评 判它最终做得好还是不好。 吴翼: RL 是机器学习这个大概念下一类比较特殊的问题。 传统机器学习的本质是记住大量标注过正确答案的数据对。 ...