人类的考试，考验不了AI了

Core Viewpoint - The article discusses the rapid advancements in AI capabilities, particularly in the context of standardized testing, and raises questions about the implications for human intelligence and the future of education [3][23]. Group 1: AI Testing and Performance - The latest AI model, Grok-4, has demonstrated exceptional performance on various standardized tests, achieving near-perfect scores on SAT and GRE, and an 88.9% accuracy on GPQA [8][17]. - The "Humanity's Last Exam" (HLM) consists of 3000 challenging questions across over 100 subjects, with 80% being exact-match questions and 42% related to mathematics, making it a rigorous measure of AI's capabilities [10][12]. - In a recent test, the DeepSeek-R1-0528 model achieved a score of 32.1% on HLM, representing a significant milestone for domestic AI models [17]. Group 2: Comparison with Human Performance - AI models are increasingly outperforming human students in standardized tests, with ByteDance's Seed1.6 scoring 648 in science and 683 in humanities on the 2025 Shandong Gaokao, placing it among the top candidates for prestigious universities [20][21]. - The performance gap between AI and human students is widening, with AI models like Grok-4 and DeepSeek showing scores that suggest they could easily surpass human capabilities in future examinations [22][23]. - The article suggests that as AI continues to excel in testing environments, the focus may shift from comparing human and AI performance to evaluating AI's practical applications and value creation [24].