Workflow
测试时扩展
icon
Search documents
“人类最后的考试”,中国模型赢了GPT-5
Core Insights - The founders of Moonlight Dark Side introduced the Kimi K2 Thinking model, which outperformed GPT-5 in several benchmark tests, generating significant interest in the global AI community [1][2] Model Performance - Kimi K2 Thinking is described as the strongest open-source thinking model to date, achieving state-of-the-art (SOTA) performance in various tests, including 44.9% in the Humanity's Last Exam (HLE) compared to GPT-5's 41.7% [2] - The model demonstrated a score of 60.2% in the BrowseComp benchmark and 56.3% in the SEAL-0 test, both surpassing GPT-5 [2] - Kimi K2 Thinking can autonomously perform up to 300 steps of tool invocation, showcasing its advanced reasoning capabilities [2][3] Technical Innovations - The model employs a "thinking-tool-thinking-tool" execution pattern, which is relatively novel in large language models [4] - The team utilized end-to-end reinforcement learning to maintain performance stability during extensive tool invocation processes [4] - Kimi K2 Thinking incorporates native INT4 quantization technology, enhancing generation speed by approximately 2 times [7] Cost and Resource Management - The team operates on a limited computing resource setup, utilizing H800 GPU clusters, and has optimized performance to maximize the capabilities of each GPU [5][6] - The actual training cost is difficult to quantify, with the previously mentioned figure of $4.6 million not being an official number [6] Market Position and Strategy - The open-source strategy of Moonlight Dark Side has led to increased international recognition for Chinese AI models, particularly after the ban on Chinese IPs from accessing certain models [7][8] - Kimi K2's API pricing is significantly lower than competitors, enhancing its competitive edge in the market [7] Future Developments - The company is planning to introduce the next-generation K3 model, which will feature significant architectural changes, including the experimental KDA (Kimi Delta Attention) module [10]