AI独立科研能力
Search documents
比IMO还难的数学挑战赛,谷歌赢了OpenAI
3 6 Ke· 2026-02-26 07:59
Core Insights - The article discusses the performance of Google's AI model Aletheia in the FirstProof challenge, highlighting its superior capabilities compared to OpenAI's models in solving complex mathematical problems [1][4]. Group 1: Performance Comparison - Aletheia achieved a remarkable result by solving 6 out of 10 problems independently, with 5 of those receiving unanimous approval from experts [1][5]. - In contrast, OpenAI's model managed to solve 5 problems, but it required human intervention to select the best answers during the evaluation process [3][5]. - The FirstProof challenge was designed by top mathematicians from prestigious institutions, featuring problems that had never been publicly released before, ensuring a fair assessment of AI capabilities [4][6]. Group 2: Problem-Solving Methodology - Aletheia utilized the Gemini 3 Deep Think model, employing a zero-human-intervention approach to read, reason, and output answers directly in LaTeX format [8][10]. - The model demonstrated dynamic resource allocation, adjusting its computational power based on the difficulty of the problems, which allowed it to tackle complex questions more effectively [10]. - Aletheia's ability to refuse to answer when it could not generate a reliable proof indicates a sophisticated filtering mechanism, preventing the generation of invalid answers [8][10]. Group 3: Expert Evaluation - The expert evaluation revealed that Aletheia received full approval for problems 2, 5, 7, 9, and 10, with problem 7 being recognized as the most challenging and previously unsolved [6][10]. - Although problem 8 did not receive unanimous approval, it still achieved a high score of 5 out of 7 from experts [6].