永别了，人类冠军，AI横扫天文奥赛，GPT-5得分远超金牌选手2.7倍

Core Insights - AI models GPT-5 and Gemini 2.5 Pro achieved gold medal levels in the International Olympiad on Astronomy and Astrophysics (IOAA), outperforming human competitors in theoretical and data analysis tests [1][3][10] Performance Summary - In the theoretical exams, Gemini 2.5 Pro scored 85.6% overall, while GPT-5 scored 84.2% [4][21] - In the data analysis exams, GPT-5 achieved a score of 88.5%, significantly higher than Gemini 2.5 Pro's 75.7% [5][31] - The performance of AI models in the IOAA 2025 was remarkable, with GPT-5 scoring 86.8%, which is 443% above the median, and Gemini 2.5 Pro scoring 83.0%, 323% above the median [22] Comparative Analysis - The AI models consistently ranked among the top performers, with GPT-5 and Gemini 2.5 Pro surpassing the best human competitors in several years of the competition [40][39] - The models demonstrated strong capabilities in physics and mathematics but struggled with geometric and spatial reasoning, particularly in the 2024 exams where geometry questions were predominant [44][45] Error Analysis - The primary sources of errors in the theoretical exams were conceptual mistakes and geometric/spatial reasoning errors, which accounted for 60-70% of total score losses [51][54] - In the data analysis exams, errors were more evenly distributed across categories, with significant issues in plotting and interpreting graphs [64] Future Directions - The research highlights the need for improved multimodal reasoning capabilities in AI models, particularly in spatial and temporal reasoning, to enhance their performance in astronomy-related problem-solving [49][62]