Workflow
Agentic Intelligence(智能体智能)
icon
Search documents
Kimi K2 不仅抢了开源第一,还抢了自家论文署名:我「夸」我自己
3 6 Ke· 2025-07-22 11:07
Core Insights - The article discusses the release of Kimi K2, the world's first open-source model with over one trillion parameters, which has sparked significant discussion in the industry [1][2] - Kimi K2 has achieved the top rank on the LMSYS open-source model leaderboard, surpassing other models in various evaluation benchmarks [2][27] - The Kimi team acknowledges that Kimi K2 is an improvement upon DeepSeek V3, addressing concerns of potential plagiarism [3][5] Technical Aspects - Kimi K2 features a mixed expert model with 1.04 trillion total parameters and 32 billion active parameters, utilizing a sparse architecture with a sparsity of 48 [12] - The training data consists of 15.5 trillion tokens, covering various domains, and has undergone quality cleaning and data augmentation techniques [12] - The MuonClip optimizer is introduced to ensure stable training, preventing significant fluctuations in the loss function during the training process [13][16] Data Strategy - Kimi K2 employs a dual data strategy combining synthetic and real-world data, generating over 100,000 tool trajectories for community use [11][20] - The model utilizes a library of over 3,000 real tools and 20,000 synthetic tools across various fields, ensuring a comprehensive training environment [20][23] - The data rewriting strategies enhance the diversity of training data, improving model performance and reducing overfitting [17][19] Performance Metrics - Kimi K2 has achieved or approached optimal performance in key areas such as coding, mathematics, tool usage, and long-text tasks, outperforming several proprietary models [27][29] - The model's performance is validated through various benchmarks, demonstrating its capabilities in complex reasoning and task execution [29][30] - Future iterations of Kimi K2 will focus on improving reasoning efficiency and self-evaluation in tool usage [31]