Long context
Search documents
姚顺雨对着唐杰杨植麟林俊旸贴大脸开讲!基模四杰中关村论英雄
量子位· 2026-01-10 13:17
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 清华攒了个局,把AI圈大半边天聚到了一块。 基模四杰全员到场: 智谱唐杰、Kimi杨植麟、阿里林俊旸, 还有…… 突然贴脸跳屏的姚顺雨。 这场由清华大学基础模型北京市重点实验室发起的 AGI-Next前沿峰会 ,相当硬核。 各位大咖的演讲简直像是在做技术报告,信息密度极高,而且用词相当犀利。 以下附上演讲原文,为提升可读性,量子位在不改变原意的前提下做了适当调整。 清华论剑 唐杰 我的题目是「让机器像人一样思考」。 唐杰:DeepSeek横空出世后,Chat已经基本结束了,下一步是走向做事。 杨植麟:做模型,本质上是在创造一种世界观。 林俊旸:中国想在AI赛道反超,很难。20%这个数字已经很乐观。 姚顺雨:toC的话,大部分人其实用不着那么强的智能。 2019年,我们在清华的支持下完成成果转化,成立了智谱。 同一时期,我们也持续推动开源,既有模型和工具层面的项目,也有面向开发者的大模型 API 体系。 我在清华待了将近二十年。 回头看,我做的事情其实很简单,主要就两件: 一是早年做AMiner;二是大模型。 有一个对我影响很深的观念,我称之为 「像喝咖啡 ...
杨植麟揭秘Kimi预训练策略:提升Token efficiency,实现长文本
Xin Lang Cai Jing· 2026-01-10 12:09
Core Insights - The core focus of the article is on the strategies for pre-training AI models, specifically emphasizing Token Efficiency and Long Context as critical components for enhancing performance in complex tasks [2][6]. Group 1: Token Efficiency - Token Efficiency is crucial because the reasoning or training of agents is fundamentally a search process, where better pre-training reduces the search space and enhances prior knowledge [3][7]. - The importance of Token Efficiency is highlighted by the need for AI to develop complex systems, such as an operating system, without enumerating every possible token combination, which may be meaningless or incorrect [7]. Group 2: Long Context - The architecture of Transformers shows significant advantages in long context scenarios, with experiments indicating that performance drops below LSTM when context length exceeds 1000 tokens, underscoring the importance of context length in model design [2][6]. - In the current Agentic era, many tasks require long contexts to execute complex instructions, making architectures with lower positional loss more technically capable [2][6]. Group 3: Aesthetic Considerations in AI - The development of AI models is not just a technical challenge but also involves aesthetic considerations, where the creation of a model reflects a worldview and values, akin to the concept of "Taste" as articulated by influential figures like Steve Jobs [3][7]. - Each model generates unique tokens that are not interchangeable, indicating that intelligence produced by different roles (e.g., a CEO vs. a designer) varies significantly, leading to an exponential increase in the space of possible "Tastes" [4][8].