Core Insights - The release of Kimi K2 Thinking by the company Moonlight Dark Side is likened to a treasure island, marking a significant milestone in the competitive landscape of large models, directly challenging top closed-source models like GPT-5 and Claude 4.5 Sonnet [1][8] - Kimi K2 Thinking has established confidence both internally and externally, proving that Moonlight Dark Side remains a leading player in the large model sector [2][8] - The model boasts 1 trillion parameters and utilizes a 384 expert mixture architecture, achieving industry-leading results in various benchmarks [8][9] Training and Development - The training cost of Kimi K2 Thinking was reported to be $4.6 million, but the company clarified that this figure is not official and that training costs are difficult to quantify due to the nature of large model pre-training [4][5] - The team adopted the Muon optimizer, which had not been widely tested, but ensured its stability through rigorous small-scale testing [5][6] - Kimi K2 Thinking employs Infiniband-connected H800 GPUs, maximizing output under strict budget constraints [5] Performance Metrics - Kimi K2 Thinking achieved notable scores in various benchmarks: 44.9% in HLE with tools, 60.2% in BrowseComp, and 71.3% in SWE-Bench Verified, showcasing its strong generalization capabilities [9][12] - The model can perform 200 to 300 tool calls in a single session without human intervention, indicating significant advancements in test-time scaling [8][12] Future Directions - The company is exploring a new architecture called Kernel Attention Dual Architecture (KDA) for future models, with potential open-sourcing of more components [7][12] - The focus is shifting from merely increasing parameter size to enhancing reasoning efficiency and practical capabilities, reflecting a broader trend in the industry [12]
Kimi K2 Thinking,是月之暗面的“复仇”
Tai Mei Ti A P P·2025-11-11 14:30