AI推理大模型

Search documents
十大推理模型挑战2025年高考数学题:DeepSeek-R1、腾讯混元T1并列第一,马斯克的Grok 3遭遇“滑铁卢”
Mei Ri Jing Ji Xin Wen· 2025-06-10 13:53
Core Insights - The discussion around the difficulty of mathematics in the 2025 college entrance examination continues to be a hot topic, with a focus on the performance of various AI reasoning models in a standardized test based on the new curriculum mathematics I paper [1] Group 1: AI Model Performance - The evaluation tested ten AI reasoning models, including DeepSeek-R1, Tencent's Mix Yuan T1, OpenAI's o3, Google's Gemini 2.5 Pro, and xAI's Grok 3, to assess their mathematical capabilities [1] - DeepSeek-R1 and Tencent's Mix Yuan T1 achieved perfect scores of 117, demonstrating exceptional performance in algebra and function problems [4] - The scores of other models included: iFlytek's Spark X1 with 112, Gemini 2.5 Pro with 109, OpenAI's o3 with 107, Alibaba's Qwen 3 with 106, and Doubao's Deep Thinking with 104 [2][7] Group 2: Evaluation Methodology - The assessment utilized a standardized test with a total score of 150, but excluded questions requiring graphical analysis to ensure a level playing field among the models [3] - Scoring was based on high school examination standards, with a focus on final answers for open-ended questions rather than the process [3] Group 3: Notable Failures - Grok 3, developed by xAI and touted as the "strongest AI," ranked third from the bottom with a score of 91, primarily due to its inability to correctly interpret multiple-choice questions [8] - The second lowest was the Zhiyu Qingyan reasoning model, scoring 78, which often faltered at the final step of reasoning, leading to lost points [8][10] - Kimi k1.5 ranked last, suffering significant score losses on the final two challenging questions [10]