中文大模型基准测评2025年年度报告：2026开年特别版：含1月底重磅模型动态评测

Investment Rating - The report does not explicitly provide an investment rating for the industry or companies involved. Core Insights - The report highlights significant advancements in Chinese large models and AI agents, marking a transition from "following" to "keeping pace" with global leaders in AI technology [14][24]. - The competitive landscape shows a clear distinction between domestic and international models, with domestic open-source models gaining substantial ground [23][47]. - The report emphasizes the importance of multi-modal capabilities and the emergence of AI agents in practical applications, particularly in programming and task planning [16][14]. Summary by Sections 1. Key Developments in 2025 - The report outlines three major phases of AI model evolution: the initial competition among models, the explosion of multi-modal capabilities, and the rise of AI agents [14][16]. - Notable models such as Kimi-K2.5-Thinking and Qwen3-Max-Thinking have emerged as leaders in specific tasks like code generation and mathematical reasoning [18][24]. 2. Annual Evaluation Results and Analysis - The 2025 annual evaluation ranks Claude-Opus-4.5-Reasoning as the top model globally, followed by Gemini-3-Pro-Preview and GPT-5.2(high) [23][45]. - Domestic models like Kimi-K2.5-Thinking and Qwen3-Max-Thinking are positioned fourth and sixth, indicating a strong competitive stance [23][45]. - The report notes that domestic models are rapidly closing the gap with international counterparts, particularly in code generation and reasoning tasks [24][48]. 3. SuperCLUE Model Quadrant and Capability Landscape - The report presents a model quadrant that categorizes models based on their capabilities in reasoning and application, highlighting the emergence of "technical leaders" and "practical leaders" in the domestic market [38][39]. - The capability landscape indicates that while domestic models excel in certain areas, they still face challenges in hallucination control and precise instruction adherence [42][48]. 4. Comparative Analysis of Domestic and International Models - The analysis reveals that closed-source models dominate the top rankings, with significant advantages in reasoning and instruction-following tasks [74][80]. - Domestic open-source models are noted for their rapid advancements, particularly in coding tasks, where they have begun to outperform some international models [56][84]. - The report emphasizes the structural differences between domestic and international models, with domestic models showing a strong trend towards open-source development [24][47].