性能比肩Gemini 3 Pro！昨晚，阿里千问最强模型来了

Core Viewpoint - The launch of Alibaba's Qwen3-Max-Thinking model marks a significant advancement in AI capabilities, positioning it among the top domestic models comparable to international leaders like GPT-5.2 and Gemini 3 Pro [1][5]. Performance Evaluation - Qwen3-Max-Thinking has achieved impressive scores across various benchmarks, including: - MMLU-Pro: 85.7 - MMLU-Redux: 92.8 - C-Eval: 93.7 - GPQA: 87.4 - LiveCodeBench v6: 85.9 - IMOAnswerBench: 83.9 - Overall, it has surpassed previous records in 19 mainstream evaluation benchmarks [4][5]. Model Specifications - The model boasts over 1 trillion parameters and has been trained on 36 trillion tokens, making it Alibaba's largest and most powerful reasoning model to date [4][5]. Innovative Features - Qwen3-Max-Thinking introduces a Heavy Mode for reasoning, allowing for iterative self-reflection and experience accumulation, which enhances problem-solving efficiency without significantly increasing token costs [13]. - The model integrates tool usage into the reasoning process, enabling it to perform complex tasks in a more strategic manner, thus reducing errors and improving real-world applicability [14]. Market Impact - As of January 2026, the Qwen series has achieved over 1 billion downloads on Hugging Face, establishing itself as one of the most popular open-source AI model series [15]. - The introduction of Qwen3-Max-Thinking signifies a shift in the AI market focus from merely intelligent chatbots to powerful intelligent agents capable of executing complex tasks [15].