Workflow
SUPERChem
icon
Search documents
北大学生与AI同场竞技 胜负花落谁家?
Yang Guang Wang· 2026-01-05 09:23
Core Insights - The article discusses a project initiated by students and faculty at Peking University aimed at evaluating the capabilities of artificial intelligence (AI) in the field of chemistry through a competitive examination against human students [1][8]. Group 1: Project Overview - The project, named SUPERChem, involves 174 undergraduate students from Peking University's School of Chemistry and Molecular Engineering competing against advanced AI models like GPT, Gemini, and DeepSeek [1]. - The examination consists of a rigorous question bank designed by nearly 100 faculty and students, including chemistry Olympiad winners, to ensure that the AI encounters questions it has not previously seen [1][2]. Group 2: Evaluation Process - The question creation process involves a three-stage review system where each question undergoes initial drafting, peer review, and final approval, often going through up to 15 iterations [4][6]. - The aim is to create a challenging set of questions that test the AI's understanding of complex chemical concepts, such as crystal structure analysis and reaction mechanisms [1][4]. Group 3: Results and Comparisons - In the examination, Peking University students achieved an average accuracy rate of 40.3%, while the AI models performed at levels comparable to lower-year undergraduate students [7]. - The performance of various AI models was as follows: GPT-5 (High) scored 39.6% in text mode and 38.5% in multimodal, while Gemini-2.5-Pro (High) scored 37.9% in multimodal [7]. Group 4: Implications and Future Directions - The project is not merely about proving AI's limitations but aims to guide the development of AI in natural sciences, encouraging researchers to consider how AI can assist in scientific breakthroughs [8]. - The initiative has prompted educators at Peking University to rethink assessment methods, focusing on creating questions that challenge AI capabilities and enhance human critical thinking [9].
174名北大学生能否考过AI? 结果很意外
Xin Lang Cai Jing· 2025-12-28 17:21
Core Insights - The article discusses a unique examination conducted at Peking University, where advanced AI models such as GPT, Gemini, and DeepSeek competed against 174 undergraduate students in the College of Chemistry and Molecular Engineering [1][2] - The examination aimed to assess whether AI truly understands chemistry, utilizing a specially designed test that emphasizes reasoning over rote memorization [3][6] Group 1: Examination Design - The test comprised 500 challenging questions derived from high-level academic literature, specifically tailored to prevent AI from relying on memorized content [2][4] - A collaborative platform was created for the team of nearly 100 students and faculty to design, review, and refine the questions, incorporating a gamified points system to enhance engagement [4] Group 2: Results and Performance - The average accuracy of the participating students was 40.3%, indicating the high difficulty level of the exam [6] - AI models performed at a level comparable to that of first-year undergraduate students, revealing limitations in their ability to process visual information and complex reasoning tasks [7] Group 3: SUPERChem Project - The SUPERChem project fills a gap in multi-modal deep reasoning assessments in the field of chemistry, serving as a benchmark for future AI development [8] - The project has been fully open-sourced, with the intention of contributing to the global scientific and AI community, highlighting the journey from knowledge retention to understanding the physical world [8]
一场特殊的“期中考”:174名北大学生能否考过AI?
Xin Lang Cai Jing· 2025-12-26 14:57
Core Viewpoint - The article discusses a unique midterm exam at Peking University's School of Chemistry and Molecular Engineering, where advanced AI models like GPT, Gemini, and DeepSeek competed against 174 undergraduate students in a specially designed "Turing test" to evaluate AI's capabilities in scientific reasoning [1][4]. Group 1: Exam Design and Purpose - The exam included 500 challenging questions that were not sourced from publicly available databases but were instead adapted from high-difficulty problems and cutting-edge professional literature [3]. - The motivation behind creating new questions was to ensure that AI models could not simply rely on memorization, as chemistry requires logical reasoning and spatial imagination [4]. - The team, consisting of nearly 100 members, aimed to assess whether AI truly understands chemistry by designing a high-barrier, reasoning-intensive exam [4][6]. Group 2: Collaborative Process - The question design process was gamified, transforming it into a collaborative effort where team members could review and critique each other's work, fostering a dynamic exchange of ideas [6]. - An incentive system was implemented to encourage participation, making the question creation process engaging and competitive [6]. Group 3: Exam Results and AI Performance - The average accuracy of the participating students was 40.3%, indicating the exam's high difficulty level [8]. - AI models performed similarly to lower-year undergraduate students, with the highest-performing model, GPT-5 (High), achieving an accuracy of 39.6% [9][10]. - The results highlighted that even top AI models struggled with complex reasoning tasks in chemistry, particularly in areas requiring deep understanding and logical coherence [15]. Group 4: Insights and Future Directions - The project, SUPERChem, fills a gap in multi-modal deep reasoning assessments in chemistry and serves as a benchmark for future AI development [16]. - The team aims to use SUPERChem as a stepping stone for AI to evolve from general knowledge retention to a deeper understanding of the physical world [16]. - The project has been fully open-sourced, with hopes that it will catalyze advancements in both the scientific and AI fields [16].