北大学生与AI同场竞技 胜负花落谁家?
Yang Guang Wang·2026-01-05 09:23

Core Insights - The article discusses a project initiated by students and faculty at Peking University aimed at evaluating the capabilities of artificial intelligence (AI) in the field of chemistry through a competitive examination against human students [1][8]. Group 1: Project Overview - The project, named SUPERChem, involves 174 undergraduate students from Peking University's School of Chemistry and Molecular Engineering competing against advanced AI models like GPT, Gemini, and DeepSeek [1]. - The examination consists of a rigorous question bank designed by nearly 100 faculty and students, including chemistry Olympiad winners, to ensure that the AI encounters questions it has not previously seen [1][2]. Group 2: Evaluation Process - The question creation process involves a three-stage review system where each question undergoes initial drafting, peer review, and final approval, often going through up to 15 iterations [4][6]. - The aim is to create a challenging set of questions that test the AI's understanding of complex chemical concepts, such as crystal structure analysis and reaction mechanisms [1][4]. Group 3: Results and Comparisons - In the examination, Peking University students achieved an average accuracy rate of 40.3%, while the AI models performed at levels comparable to lower-year undergraduate students [7]. - The performance of various AI models was as follows: GPT-5 (High) scored 39.6% in text mode and 38.5% in multimodal, while Gemini-2.5-Pro (High) scored 37.9% in multimodal [7]. Group 4: Implications and Future Directions - The project is not merely about proving AI's limitations but aims to guide the development of AI in natural sciences, encouraging researchers to consider how AI can assist in scientific breakthroughs [8]. - The initiative has prompted educators at Peking University to rethink assessment methods, focusing on creating questions that challenge AI capabilities and enhance human critical thinking [9].