Workflow
GPT 4.1
icon
Search documents
AI大家说 | Kimi K2:全球首个完全开源的Agentic模型
红杉汇· 2025-07-18 12:24
Core Viewpoint - Moonshot AI has officially released the Kimi K2 model, which is designed for Agentic workflows, showcasing advanced capabilities in understanding complex instructions and autonomously executing multi-step tasks [2][3][26] Group 1: Model Architecture and Capabilities - Kimi K2 is built on a sparse MoE (Mixture-of-Experts) architecture, featuring a total of 1 trillion parameters and 32 billion active parameters, with 384 experts [4][5] - The model can dynamically activate relevant experts based on task requirements, allowing for efficient parameter utilization [4][5] - Kimi K2 has a maximum context length of 128K, enhancing its ability to handle long documents and complex retrieval tasks [8] Group 2: Training and Optimization - The model underwent pre-training on 15.5 trillion tokens using the MuonClip optimizer, which effectively addressed gradient instability and convergence issues [7][10] - Kimi K2 incorporates a self-judging mechanism to improve performance on non-verifiable tasks, continuously optimizing its capabilities [7] Group 3: Performance Metrics - Kimi K2 achieved state-of-the-art (SOTA) results in various benchmark tests, including SWE Bench Verified, Tau2, and AceBench, demonstrating superior performance in coding, agent tasks, and mathematical reasoning [8][25] - In programming tasks, Kimi K2 scored 53.7% accuracy in LiveCodeBench, surpassing GPT-4.1 [19] - The model's tool-calling ability reached an accuracy of 65.8% in SWE-bench Verified tests, indicating its proficiency in parsing complex instructions [21] Group 4: Industry Impact and Recognition - Kimi K2 has generated significant discussion within the global AI community, with notable endorsements from industry leaders, including NVIDIA's founder Jensen Huang [9][12] - The model's open-source nature has led to rapid adoption by major platforms such as OpenRouter and Microsoft's Visual Studio Code [12] - Kimi K2 has been recognized as one of the best open-source models globally, with academic and industry consensus on its capabilities [14][16] Group 5: Future Implications - The release of Kimi K2 is expected to enhance the developer ecosystem and expand its applications in various fields, transitioning AI from a mere conversational tool to a productivity engine [26]
o3深度解读:OpenAI终于发力,agent产品危险了吗?
Hu Xiu· 2025-04-25 14:21
我们在2025年Q1的大模型季报中提到,在AGI路线图上,只有智能提升是唯一主线,因此我们持续关注头部AI Lab的模型发布。上周OpenAI密集发布了o 系列最新的两个模型o3和o4-mini,开源了Codex CLI,还推出了在API中使用的GPT 4.1。本文将着重对这些新发布进行解读,尤其是o3 agentic和多模态 CoT新能力。 我们认为OpenAI在数次平淡的更新后,终于拿出了有惊艳表现的o3。融合了tool use能力后,模型表现已经覆盖了agent产品常用的use case。Agent产品开 始分化出两类路线:一类是像o3那样把tool use通过CoT内化到模型中,模型可以用写代码调用的方式执行任务;另一类是类似Manus,把工作流程外化 成人类OS中的computer use。同时OpenAI已经把agent产品作为了未来产品商业化收入占比的大头,我们有理由担心通用agent产品在大模型公司主航道上 被覆盖。 长线看,RL Scaling是进步斜率最大的方向,上周两位RL教父Richard Sutton和David Silver发布了一篇很重要的文章Era of Experience, ...
o3 深度解读:OpenAI 终于发力 tool use,agent 产品危险了吗?
海外独角兽· 2025-04-25 11:52
作者:cage, haozhen 我们在 2025 年 Q1 的大模型季报 中提到,在 AGI 路线图上,只有智能提升是唯一主线,因此我们持 续关注头部 AI Lab 的模型发布。上周 OpenAI 密集发布了 o 系列最新的两个模型 o3 和 o4-mini,开 源了 Codex CLI,还推出了在 API 中使用的 GPT 4.1。本文将着重对这些新发布进行解读,尤其是 o3 agentic 和多模态 CoT 新能力。 我们认为 OpenAI 在数次平淡的更新后,终于拿出了有惊艳表现的 o3。融合了 tool use 能力后,模型 表现已经覆盖了 agent 产品常用的 use case。Agent 产品开始分化出两类路线:一类是像 o3 那样把 和 o3 的发布模式一样, OpenAI 的 reasoning model 都是先训练出一个 mini reasoning 版本,再 scale 到 一个 long inference time、full tool use 能力的模型上。 而之前 GPT 模型总是先训练出最大的模型,再蒸 馏到小模型上。这个策略值得探讨其原因,我们的猜测是 RL 算法比较脆弱, ...