Workflow
仅用提示词工程摘下IMO金牌!清华校友强强联手新发现,学术界不靠砸钱也能比肩大厂
量子位·2025-08-02 05:23

Core Viewpoint - The collaboration between two Tsinghua University alumni has successfully enhanced the Gemini 2.5 Pro model to achieve a gold medal level in the International Mathematical Olympiad (IMO) through a self-iterative verification process and prompt optimization [1][4][10]. Group 1: Model Performance and Methodology - Gemini 2.5 Pro achieved a 31.55% accuracy rate in solving IMO problems, significantly outperforming other models like O3 and Grok 4 [9]. - The research team utilized a structured six-step self-verification process to improve the model's performance, which includes generating initial solutions, self-improvement, and validating solutions [16][18]. - The model was able to generate complete and mathematically rigorous solutions for 5 out of 6 IMO problems, demonstrating the effectiveness of the structured iterative process [24][23]. Group 2: Importance of Prompt Design - The use of specific prompt designs significantly improved the model's ability to solve complex mathematical problems, highlighting the importance of prompt engineering in AI model performance [12][14]. - The research indicated that detailed prompts could reduce the computational search space and enhance efficiency without granting the model new capabilities [23]. Group 3: Research Team Background - The authors, Huang Yichen and Yang Lin, are both Tsinghua University alumni with extensive academic backgrounds in physics and computer science, contributing to the credibility of the research [26][28][33]. - Yang Lin is currently an associate professor at UCLA, focusing on reinforcement learning and generative AI, while Huang Yichen has a strong background in quantum physics and machine learning [30][35]. Group 4: Future Directions and Insights - The research team plans to enhance the model's capabilities through additional training data and fine-tuning, indicating a commitment to ongoing improvement [42]. - Yang Lin expressed the potential for AI to play a more significant role in mathematical research, especially in addressing long-standing unresolved problems [44].