Kimi K3 - filings, earnings calls, financial reports, news

Kimi K3

Search documents

量子位· 2026-02-03 00:37

克雷西发自凹非寺量子位 | 公众号 QbitAI 开源热榜第一轮流做，现在花落Kimi。在Hugging Face上，Kimi K2.5登上了Trending榜首，下载量超过了5.3万。 Kimi K2.5主打Agent能力，在HLE-Full、BrowseComp等测试集中，成绩超越了GPT-5.2、Claude 4.5 Opus以及Gemini 3 Pro等旗舰闭源模型。现在，官方的技术报告也已经亮相，Kimi K2.5怎样练成，我们可以从中窥探一些答案。原生多模态，15T Token混合训练 Kimi K2.5在K2的架构基础上，投入了15T的视觉与文本混合Token进行持续预训练。它选择了一条原生多模态的技术路线，让同一套参数空间直接处理视觉信号与文本逻辑。在15T这样庞大的数据量级下，视觉理解与文本推理能力实现了同步增强，一改往日"此消彼长"的局面。而且极具性价比，在BrowseComp上达到比GPT-5.2更高的表现， Kimi K2.5的资金消耗仅有不到5% 。这种统一的参数架构，让模型能够像理解语法结构一样，精准解析像素背后的逻辑语义。有了这套原生底座，K2.5解锁 ...

原生多模态

智能体集群

线性注意力机制

Artificial Intelligence

Artificial Intelligence

Kimi K2.5

Kimi K3

月之暗面三位联创深夜回应一切，3小时答全球网友23问，杨植麟剧透Kimi K3提升巨大

3 6 Ke· 2026-01-29 00:17

智东西1月29日报道，今天凌晨，月之暗面核心团队在社交媒体平台Reddit上举行了一场有问必答（AMA）活动。三位联合创始人杨植麟（CEO）、周昕宇（算法团队负责人）和吴育昕与全球网友从0点聊到3点，把许多关键问题都给聊透了，比如Kimi K2.5是否蒸馏自Claude、 Kimi K3将带来的提升与改变，以及如何在快速迭代与长期基础研究之间取得平衡。 ▲AMA栏目截图（图源：Reddit）一开始，便有网友抛出尖锐问题：Kimi K2.5有时会自称为Claude，有人怀疑这是对Claude进行蒸馏的证据。杨植麟回应道，这一现象主要是由在预训练阶段对最新编程数据进行了上采样，而这些数据似乎与"Claude"这个token的关联性较强，事实上，K2.5在许多基准测试中似乎都优于Claude。谈及Kimi K3，杨植麟没透露太多细节，但提到了K3会在Kimi Linear上加入更多架构优化，他相信，就算Kimi K3没比K2.5强10倍，也肯定会强很多。整场问答中，月之暗面的三位联合创始人共回答了40多个问题。智东西也向他们提出了3个问题，并获得了直接回应。当智东西问及月之暗面的算力储备时，杨植麟称 ...

杨植麟带 Kimi 团队深夜回应：关于 K2 Thinking 爆火后的一切争议

AI前线· 2025-11-11 06:42

Core Insights - The article discusses the launch of Kimi K2 Thinking by Moonshot AI, highlighting its capabilities and innovations in the AI model landscape [2][27]. - Kimi K2 Thinking has achieved impressive results in various global AI benchmarks, outperforming leading models like GPT-5 and Claude 4.5 [10][12]. Group 1: Model Performance - Kimi K2 Thinking excelled in benchmarks such as HLE and BrowseComp, surpassing GPT-5 and Claude 4.5, showcasing its advanced reasoning capabilities [10][12]. - In the AIME25 benchmark, Kimi K2 Thinking scored 99.1%, nearly matching GPT-5's 99.6% and outperforming DeepSeek V3.2 [12]. - The model's performance in coding tasks was notable, achieving scores of 61.1%, 71.3%, and 47.1% in various coding benchmarks, demonstrating its capability in software development [32]. Group 2: Innovations and Features - Kimi K2 Thinking incorporates a novel KDA (Kimi Delta Attention) mechanism, which enhances long-context consistency and reduces memory usage [15][39]. - The model is designed as an "Agent," capable of autonomous planning and execution, allowing it to perform 200-300 tool calls without human intervention [28][29]. - The architecture allows for a significant increase in reasoning depth and efficiency, balancing the need for speed and accuracy in complex tasks [41]. Group 3: Future Developments - The team is working on a visual language model (VL) and plans to implement improvements based on user feedback regarding the model's performance [18][20]. - Kimi K3 is anticipated to build upon the innovations of Kimi K2, with the KDA mechanism likely to be retained in future iterations [15][18]. - The company aims to address the "slop problem" in language generation, focusing on enhancing emotional expression and reducing overly sanitized outputs [25].