Kimi K3
Search documents
Kimi K2.5登顶开源第一!15T数据训练秘籍公开,杨植麟剧透K3
量子位· 2026-02-03 00:37
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 开源热榜第一轮流做,现在花落Kimi。 在Hugging Face上,Kimi K2.5登上了Trending榜首,下载量超过了5.3万。 Kimi K2.5主打Agent能力,在HLE-Full、BrowseComp等测试集中,成绩超越了GPT-5.2、Claude 4.5 Opus以及Gemini 3 Pro等旗舰闭源 模型。 现在,官方的技术报告也已经亮相,Kimi K2.5怎样练成,我们可以从中窥探一些答案。 原生多模态,15T Token混合训练 Kimi K2.5在K2的架构基础上,投入了15T的视觉与文本混合Token进行持续预训练。 它选择了一条 原生多模态 的技术路线,让同一套参数空间直接处理视觉信号与文本逻辑。 在15T这样庞大的数据量级下,视觉理解与文本推理能力实现了同步增强,一改往日"此消彼长"的局面。 而且极具性价比,在BrowseComp上达到比GPT-5.2更高的表现, Kimi K2.5的资金消耗仅有不到5% 。 这种统一的参数架构,让模型能够像理解语法结构一样,精准解析像素背后的逻辑语义。 有了这套原生底座,K2.5解锁 ...
月之暗面三位联创深夜回应一切,3小时答全球网友23问,杨植麟剧透Kimi K3提升巨大
3 6 Ke· 2026-01-29 00:17
智东西1月29日报道,今天凌晨,月之暗面核心团队在社交媒体平台Reddit上举行了一场有问必答(AMA)活动。三位联合创始人杨植麟 (CEO)、周昕宇(算法团队负责人)和吴育昕与全球网友从0点聊到3点,把许多关键问题都给聊透了,比如Kimi K2.5是否蒸馏自Claude、 Kimi K3将带来的提升与改变,以及如何在快速迭代与长期基础研究之间取得平衡。 ▲AMA栏目截图(图源:Reddit) 一开始,便有网友抛出尖锐问题:Kimi K2.5有时会自称为Claude,有人怀疑这是对Claude进行蒸馏的证据。杨植麟回应道,这一现象主要是 由在预训练阶段对最新编程数据进行了上采样,而这些数据似乎与"Claude"这个token的关联性较强,事实上,K2.5在许多基准测试中似乎都优 于Claude。 谈及Kimi K3,杨植麟没透露太多细节,但提到了K3会在Kimi Linear上加入更多架构优化,他相信,就算Kimi K3没比K2.5强10倍,也肯定会 强很多。 整场问答中,月之暗面的三位联合创始人共回答了40多个问题。智东西也向他们提出了3个问题,并获得了直接回应。 当智东西问及月之暗面的算力储备时,杨植麟称 ...
杨植麟带 Kimi 团队深夜回应:关于 K2 Thinking 爆火后的一切争议
AI前线· 2025-11-11 06:42
Core Insights - The article discusses the launch of Kimi K2 Thinking by Moonshot AI, highlighting its capabilities and innovations in the AI model landscape [2][27]. - Kimi K2 Thinking has achieved impressive results in various global AI benchmarks, outperforming leading models like GPT-5 and Claude 4.5 [10][12]. Group 1: Model Performance - Kimi K2 Thinking excelled in benchmarks such as HLE and BrowseComp, surpassing GPT-5 and Claude 4.5, showcasing its advanced reasoning capabilities [10][12]. - In the AIME25 benchmark, Kimi K2 Thinking scored 99.1%, nearly matching GPT-5's 99.6% and outperforming DeepSeek V3.2 [12]. - The model's performance in coding tasks was notable, achieving scores of 61.1%, 71.3%, and 47.1% in various coding benchmarks, demonstrating its capability in software development [32]. Group 2: Innovations and Features - Kimi K2 Thinking incorporates a novel KDA (Kimi Delta Attention) mechanism, which enhances long-context consistency and reduces memory usage [15][39]. - The model is designed as an "Agent," capable of autonomous planning and execution, allowing it to perform 200-300 tool calls without human intervention [28][29]. - The architecture allows for a significant increase in reasoning depth and efficiency, balancing the need for speed and accuracy in complex tasks [41]. Group 3: Future Developments - The team is working on a visual language model (VL) and plans to implement improvements based on user feedback regarding the model's performance [18][20]. - Kimi K3 is anticipated to build upon the innovations of Kimi K2, with the KDA mechanism likely to be retained in future iterations [15][18]. - The company aims to address the "slop problem" in language generation, focusing on enhancing emotional expression and reducing overly sanitized outputs [25].