资源不到万亿 OpenAI 的 1% ，Kimi 新模型超越 GPT-5

Core Insights - Kimi has launched the K2 Thinking model, its strongest open-source thinking model to date, featuring 1 trillion parameters and advanced capabilities [2][3] - K2 Thinking model surpasses both open-source and closed-source counterparts in various benchmark tests, achieving state-of-the-art (SOTA) performance [3][10] - The model can autonomously perform up to 300 rounds of tool calls and multi-turn reasoning, indicating a significant advancement from the previous K2 model [6][20] Benchmark Performance - K2 Thinking achieved a 44.9% SOTA score in the Humanity's Last Exam (HLE), a new benchmark designed to evaluate large models' capabilities [10][13] - The HLE test set includes 2,500 advanced academic questions across over 100 disciplines, contributed by nearly 1,000 experts from 50 countries [10][13] - Initial flagship model scores were below 20%, but advancements have led to scores exceeding 40% across the board [13] Model Development and Paradigms - Kimi's approach transitioned from a focus on "model as agent" to "model as thinking agent," emphasizing multi-turn interactions and tool usage [6][15] - The K2 Thinking model incorporates a framework that allows for better interaction with the external world, enhancing its reasoning capabilities [15][21] - The model's ability to maintain reasoning continuity through multi-step tool calls is a unique feature not supported by competitors like OpenAI's GPT series and Google's Gemini [21][23] Competitive Landscape - Kimi's valuation is significantly lower than that of major competitors, with estimates at 0.5% of OpenAI's and 2% of Anthropic's valuations [26][28] - Despite limited resources, Kimi has managed to outperform larger models like GPT-5 and Grok-4 using less than 1% of the resources [29][30] - The current landscape suggests a potential shift in the AI competition, with the possibility of Chinese companies gaining an edge over American counterparts [30]