Workflow
量化感知训练
icon
Search documents
致敬Kimi K2:基于slime的全流程INT4量化感知RL训练
机器之心· 2026-02-03 10:35
Core Insights - The SGLang RL team has successfully implemented the INT4 Quantization-Aware Training (QAT) process inspired by the Kimi K2 team, achieving stability and consistency comparable to BF16 full precision training while enabling extreme compression of large models [2][3][4]. Technical Overview - The project is a collaboration among multiple teams, including SGLang RL, InfiXAI, Ant Group, and others, with functionalities shared in the slime and Miles communities [4]. - A complete QAT INT4 closed-loop solution has been established, enhancing training stability and efficiency in reinforcement learning (RL) scenarios [6]. - The rollout efficiency has significantly improved by eliminating cross-machine communication bottlenecks, allowing 1TB models to fit within a single H200 (141G) GPU memory [6][10]. Training Process - The training phase utilizes Fake Quantization to simulate quantization noise while maintaining high precision BF16 weights, ensuring the model adapts to low precision representations [8][9]. - The Straight-Through Estimator (STE) technique allows gradients to bypass the non-differentiable quantization operations, maintaining the training continuity [9][11]. - The transition from BF16 weights to INT4 format is executed during the weight conversion phase, facilitating efficient inference [10][25]. Performance Evaluation - Experiments demonstrate that the QAT INT4 training approach maintains robust performance, with the rollout configuration showing consistent growth in raw rewards compared to BF16 and FP8 configurations [41][46]. - The INT4 QAT strategy effectively mitigates discrepancies between training and inference outputs, achieving a high degree of consistency [51][56]. Future Directions - The project aims to explore further optimizations to enhance training efficiency and investigate the application of FP4 precision in RL training and inference as NVIDIA's Blackwell architecture becomes more prevalent [58][62].
杨植麟带 Kimi 团队深夜回应:关于 K2 Thinking 爆火后的一切争议
AI前线· 2025-11-11 06:42
Core Insights - The article discusses the launch of Kimi K2 Thinking by Moonshot AI, highlighting its capabilities and innovations in the AI model landscape [2][27]. - Kimi K2 Thinking has achieved impressive results in various global AI benchmarks, outperforming leading models like GPT-5 and Claude 4.5 [10][12]. Group 1: Model Performance - Kimi K2 Thinking excelled in benchmarks such as HLE and BrowseComp, surpassing GPT-5 and Claude 4.5, showcasing its advanced reasoning capabilities [10][12]. - In the AIME25 benchmark, Kimi K2 Thinking scored 99.1%, nearly matching GPT-5's 99.6% and outperforming DeepSeek V3.2 [12]. - The model's performance in coding tasks was notable, achieving scores of 61.1%, 71.3%, and 47.1% in various coding benchmarks, demonstrating its capability in software development [32]. Group 2: Innovations and Features - Kimi K2 Thinking incorporates a novel KDA (Kimi Delta Attention) mechanism, which enhances long-context consistency and reduces memory usage [15][39]. - The model is designed as an "Agent," capable of autonomous planning and execution, allowing it to perform 200-300 tool calls without human intervention [28][29]. - The architecture allows for a significant increase in reasoning depth and efficiency, balancing the need for speed and accuracy in complex tasks [41]. Group 3: Future Developments - The team is working on a visual language model (VL) and plans to implement improvements based on user feedback regarding the model's performance [18][20]. - Kimi K3 is anticipated to build upon the innovations of Kimi K2, with the KDA mechanism likely to be retained in future iterations [15][18]. - The company aims to address the "slop problem" in language generation, focusing on enhancing emotional expression and reducing overly sanitized outputs [25].