一场特殊的“期中考”:174名北大学生能否考过AI?
Xin Lang Cai Jing·2025-12-26 14:57

Core Viewpoint - The article discusses a unique midterm exam at Peking University's School of Chemistry and Molecular Engineering, where advanced AI models like GPT, Gemini, and DeepSeek competed against 174 undergraduate students in a specially designed "Turing test" to evaluate AI's capabilities in scientific reasoning [1][4]. Group 1: Exam Design and Purpose - The exam included 500 challenging questions that were not sourced from publicly available databases but were instead adapted from high-difficulty problems and cutting-edge professional literature [3]. - The motivation behind creating new questions was to ensure that AI models could not simply rely on memorization, as chemistry requires logical reasoning and spatial imagination [4]. - The team, consisting of nearly 100 members, aimed to assess whether AI truly understands chemistry by designing a high-barrier, reasoning-intensive exam [4][6]. Group 2: Collaborative Process - The question design process was gamified, transforming it into a collaborative effort where team members could review and critique each other's work, fostering a dynamic exchange of ideas [6]. - An incentive system was implemented to encourage participation, making the question creation process engaging and competitive [6]. Group 3: Exam Results and AI Performance - The average accuracy of the participating students was 40.3%, indicating the exam's high difficulty level [8]. - AI models performed similarly to lower-year undergraduate students, with the highest-performing model, GPT-5 (High), achieving an accuracy of 39.6% [9][10]. - The results highlighted that even top AI models struggled with complex reasoning tasks in chemistry, particularly in areas requiring deep understanding and logical coherence [15]. Group 4: Insights and Future Directions - The project, SUPERChem, fills a gap in multi-modal deep reasoning assessments in chemistry and serves as a benchmark for future AI development [16]. - The team aims to use SUPERChem as a stepping stone for AI to evolve from general knowledge retention to a deeper understanding of the physical world [16]. - The project has been fully open-sourced, with hopes that it will catalyze advancements in both the scientific and AI fields [16].