Kimi K2 Thinking突袭！智能体&推理能力超GPT-5，网友：再次缩小开源闭源差距

Core Insights - Kimi K2 Thinking is the most powerful open-source thinking model to date, capable of executing 200-300 consecutive tool calls without human intervention [1][3] - The model significantly narrows the gap between open-source and closed-source models, generating considerable discussion upon its release [3] Technical Details - Kimi K2 Thinking features 1TB of parameters, with 32 billion active parameters, and utilizes INT4 precision instead of FP8 [5][30] - It has a context window of 256K, allowing for enhanced reasoning capabilities [5] - The model has achieved state-of-the-art (SOTA) results in various benchmarks, surpassing closed-source models like GPT-5 and Claude Sonnet 4.5 [8][12] Performance Metrics - In the Human Last Exam (HLE), Kimi K2 Thinking achieved a SOTA score of 44.9% while using tools such as search and Python [12] - The model demonstrated a significant improvement in agent capabilities, increasing performance from 73% to 93% in the Artificial Analysis benchmark [15] - In the BrowseComp benchmark, Kimi K2 Thinking scored 60.2%, showcasing its advanced search and browsing abilities [18] Agentic Programming Capabilities - Kimi K2 Thinking shows enhanced programming capabilities, performing competitively against top closed-source models in various coding benchmarks [22] - The model can effectively handle complex front-end tasks, converting creative ideas into functional products [24] General Capabilities Upgrade - The model exhibits improved creative writing skills, producing clear and engaging narratives while maintaining stylistic coherence [28] - In academic and research contexts, Kimi K2 Thinking demonstrates significant advancements in analytical depth and logical structure [28] - The model's responses to personal or emotional queries are more empathetic and nuanced, providing actionable insights [28] Quantization and Performance - Kimi K2 Thinking employs native INT4 quantization, enhancing reasoning speed by approximately 2 times and improving compatibility with various hardware [30][31] - The model's design allows for effective handling of long decoding lengths without significant performance loss [30] Testing and Real-World Applications - Initial tests indicate that Kimi K2 Thinking can solve complex problems, such as programming tasks, efficiently [41][42] - The model's ability to break down ambiguous questions into clear, executable sub-tasks enhances its practical utility [21]