Workflow
我让10个大模型又参加了完整版数学高考,第一名居然是它。。。
数字生命卡兹克·2025-06-09 21:20

Core Viewpoint - The article discusses the performance of various AI models in a simulated high school mathematics exam, highlighting unexpected results and the rapid evolution of AI capabilities in understanding and solving mathematical problems [1][21]. Group 1: Testing Methodology - The testing included previously missing models such as Zhiyu Z1, Kimi 1.5, and Wenxin X1, aiming to provide a comprehensive assessment of AI models' mathematical abilities [3][8]. - Specific scoring rules were established, focusing on correctness of results rather than step-by-step solutions, with each question being run through the models three times to determine accuracy [5][6]. - The inclusion of multimodal questions required models to interpret images, which proved challenging for many, with only OpenAI's model performing adequately [10][12]. Group 2: Results and Rankings - The results were surprising, with models like Xunfei Xinghuo and Doubao achieving high scores of 145 points, excelling in most questions except for a specific one [15][16]. - Qwen3 scored 143.3 points, performing well in answer questions but losing points in fill-in-the-blank sections [16]. - Gemini 2.5 Pro ranked fourth with a score of 139.7 points, while other models like Hunyuan T1 and Wenxin X1 tied for fifth place with slightly lower scores [17][18]. Group 3: Observations and Implications - The article notes the rapid advancement of AI, suggesting that within two years, AI models have reached a level comparable to that of excellent students in high school mathematics [21]. - The author expresses a sense of excitement and surprise at the results, indicating a positive outlook on the future capabilities of AI in educational contexts [22].