PhysicsMinions

Search documents
 开源模型首次物理奥赛IPhO夺金!上海AI Lab 235B模型击败GPT-5和Grok-4
 量子位· 2025-10-25 06:23
 Core Insights - The open-source model P1-235B-A22B has won a gold medal at the International Physics Olympiad (IPhO), marking a significant achievement for open-source AI in complex physical reasoning [1][20]. - In the HiPhO benchmark test covering 13 global physics competitions from 2024 to 2025, P1-235B-A22B achieved 12 gold and 1 silver medal, tying for first place with Google's Gemini-2.5-Pro [3][19]. - The performance of P1-235B-A22B surpasses that of other models like GPT-5 and Grok-4, indicating that open-source models have reached or exceeded the capabilities of closed-source models in physical reasoning [5][19].   Benchmark Testing - The HiPhO benchmark test was developed to evaluate the performance of physics competition models, aligning closely with human assessment standards [7][8]. - The benchmark includes 13 major physics competitions, ensuring a comprehensive evaluation of model performance against human competitors [7][8].   Training Methodology - P1 series models utilize a multi-stage reinforcement learning process, which includes strategies like context window expansion and pass rate filtering to enhance training efficiency [10][11][12]. - The training dataset consists of thousands of competition-level problems, each with complete context, verifiable answers, and standard solution processes [9].   Multi-Agent System - The PhysicsMinions system, designed for collaborative evolution in physical reasoning, consists of three interactive modules that improve solution quality through self-verification and iterative reflection [13][14]. - This system has demonstrated significant improvements in the reasoning quality and robustness of complex physical problems [13][14].   Performance Results - P1-235B-A22B achieved an average score of 35.9 in the HiPhO benchmark, which increased to 38.4 after integrating the PhysicsMinions system, outperforming other leading models [21]. - The model's performance in various domains, including mathematics and coding, has shown significant advantages, indicating strong generalization capabilities [22].

