Kimi K2官方技术报告出炉：采用384个专家，训练不靠刷题靠“用自己的话再讲一遍”

Core Viewpoint - Kimi K2 has emerged as a leading open-source model, showcasing significant advancements in capabilities, particularly in code, agent tasks, and mathematical reasoning [4][5]. Group 1: Technical Highlights - Kimi K2 features a total parameter count of 1 trillion and 32 billion active parameters, demonstrating its advanced capabilities [4]. - The model has achieved state-of-the-art (SOTA) performance in various benchmark tests, including SWE Bench Verified, Tau2, and AceBench [12]. - The Kimi team emphasizes a shift from static imitation learning to Agentic Intelligence, requiring models to autonomously perceive, plan, reason, and act in complex environments [9][10]. Group 2: Core Innovations - Three core innovations are implemented in Kimi K2: 1. MuonClip optimizer, which replaces traditional Adam optimizer, allowing for lossless spike pre-training on 15.5 trillion tokens [11]. 2. Large-scale Agentic Tool Use data synthesis, enabling the generation of multi-turn tool usage scenarios across hundreds of domains and thousands of tools [12]. 3. A universal reinforcement learning framework that extends alignment from static to open domains [12]. Group 3: Pre-training and Post-training Phases - During the pre-training phase, Kimi K2 optimizes both the optimizer and data, utilizing the MuonClip optimizer to enhance training stability and efficiency [21][22]. - The training data covers four main areas: web content, code, mathematics, and knowledge, all subjected to strict quality screening [24]. - The post-training phase involves supervised fine-tuning and reinforcement learning, with a focus on generating high-quality training data through a rejection sampling mechanism [30][31]. Group 4: Reinforcement Learning Process - The reinforcement learning process includes creating verifiable reward environments for objective evaluation of model performance [33]. - A self-critique reward mechanism is introduced, allowing the model to evaluate its outputs based on predefined standards [34]. - The model generates diverse agentic tasks and tool combinations, ensuring a comprehensive training approach [35]. Group 5: Infrastructure and Performance - Kimi K2's training relies on a large-scale high-bandwidth GPU cluster composed of NVIDIA H800, ensuring efficient training across various resource scales [38]. - Each node is equipped with 2TB of memory, facilitating high-speed interconnectivity among GPUs [39].