自适应工具调用
Search documents
那个用半成品刷爆SOTA的Qwen3超大杯推理版,现在正式上线
量子位· 2026-01-26 15:30
Core Viewpoint - The article highlights the launch of Qwen3-Max-Thinking by Alibaba Qwen, which has achieved state-of-the-art (SOTA) performance in various benchmark tests, surpassing leading models like GPT-5.2-Thinking and Claude-Opus-4.5 in multiple categories [1][2]. Group 1: Model Performance - Qwen3-Max-Thinking has demonstrated superior performance in 19 authoritative benchmark tests, achieving scores that match or exceed those of top closed-source models [1]. - In the MMLU-Pro benchmark, Qwen3-Max-Thinking scored 85.7, while GPT-5.2-Thinking scored 87.4, and Claude-Opus-4.5 scored 89.5 [2]. - The model's reasoning capabilities were highlighted, achieving a score of 91.5 in the IMO-AnswerBench, the highest among competitors [31]. Group 2: Technical Innovations - Qwen3-Max-Thinking incorporates two key innovations: adaptive tool invocation and test-time scaling, which significantly enhance its reasoning performance and native agent capabilities [3][19]. - The adaptive tool invocation allows the model to autonomously select and utilize built-in functions such as search and code interpreters during interactions, improving efficiency [22][24]. - Test-time scaling allocates additional computational resources during the reasoning phase, leading to improved performance without unnecessary redundancy [27][30]. Group 3: Market Impact and Adoption - The article notes that Chinese open-source AI models have gained significant traction, with a 17.1% adoption rate in global model downloads, surpassing the U.S. at 15.8% [36]. - Alibaba's Qwen series has achieved over 10 billion downloads, averaging 1.1 million downloads per day, establishing itself as a new benchmark in the global AI open-source community [39]. - The integration of Qwen models into Alibaba's ecosystem, including platforms like Taobao and Alipay, indicates a strategic focus on combining top-tier model capabilities with practical applications [42][43].