国产大模型在多项基准测试中超越GPT-5

Core Insights - The article discusses the recent online Q&A session held by the founders of "Moon's Dark Side," focusing on their new Kimi K2 Thinking model, which has outperformed GPT-5 in several benchmark tests [1][3]. Model Performance - Kimi K2 Thinking is touted as the strongest open-source thinking model to date, achieving state-of-the-art (SOTA) performance in various tests, including 44.9% in the Humanity's Last Exam (HLE) compared to GPT-5's 41.7% [3]. - In the BrowseComp benchmark, Kimi K2 scored 60.2%, surpassing GPT-5's 54.9%, and in the SEAL-0 test, it achieved 56.3%, again outperforming GPT-5's 51.4% [3][4]. Technical Innovations - The model can autonomously perform 200 to 300 tool calls to solve complex problems, showcasing a new "think-tool-think-tool" execution mode [4]. - The team employed end-to-end reinforcement learning to maintain performance stability during extensive tool calls, ensuring effective retrieval and reasoning throughout the process [4]. Engineering Optimization - The team utilized H800 GPU clusters with Infiniband, maximizing the performance of each GPU despite limited computational resources [6]. - The training cost is difficult to quantify, with the stated $4.6 million not being an official figure, as most costs are related to research and experimentation [6]. Open Source Strategy - The open-source approach has garnered international recognition for Chinese AI models, with Kimi K2's API being significantly cheaper than competitors like Claude [8]. - Despite concerns about using Chinese LLMs, the founders believe that open-source models can alleviate some of these apprehensions [8]. Market Position - Kimi K2 has gained traction in the market, with a notable increase in API usage following restrictions on other models for Chinese IPs [8]. - In a recent ranking, Chinese models occupied seven spots in the top twenty, with Kimi K2 and Grok4 leading in daily processing volume, surpassing 10 billion tokens [8][9]. Future Developments - The company is planning the next-generation K3 model, which will incorporate significant architectural changes, including the experimental KDA (Kimi Delta Attention) module [10].