国产AI拿下国际物理奥赛金牌,13项顶级竞赛豪取12金1银,划重点:开源
量子位·2025-11-22 03:07

Core Insights - The article discusses the achievements of the P1 model family developed by the Shanghai Artificial Intelligence Laboratory, particularly the P1-235B-A22B model, which has excelled in various physics competitions, including the International Physics Olympiad (IPhO) 2025, where it became the first open-source model to reach the gold medal threshold [1][3][37]. Group 1: Model Performance - P1-235B-A22B scored 21.2 out of 30 in the IPhO 2025 theoretical exam, ranking third overall, just behind Gemini-2.5-Pro and GPT-5 [3][37]. - In the HiPhO benchmark, which includes 13 top physics competitions, the average score of P1-235B-A22B improved from 35.9 to 38.4 after integrating the PhysicsMinions framework, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4) [5][38]. - In the Chinese Physics Olympiad (CPhO) 2025, P1-235B-A22B achieved a score of 227 out of 320, significantly higher than the human gold medalist's score of 199 [6][41]. Group 2: Training Methodology - The model was trained using a multi-stage reinforcement learning process, formalizing physics problem-solving as a sequential decision-making task [19][20]. - A high-quality dataset of 5,065 physics problems was constructed, including 4,126 from Olympiads and 939 from textbooks, covering five major fields and 25 subfields [11][13]. - The training utilized a novel Group Sequence Policy Optimization (GSPO) method to enhance learning efficiency and address the sparsity of rewards in physics problem-solving [20][23]. Group 3: Open Source and Collaboration - The entire process, from model architecture to evaluation datasets and the intelligent agent framework, has been made fully open-source [9]. - The PhysicsMinions framework, consisting of three interactive modules (Visual Studio, Logic Studio, and Review Studio), was designed to enhance the reasoning quality of the model [30][33]. - The collaborative approach within PhysicsMinions allows for continuous improvement of answers through a structured review process [30][33]. Group 4: Competitive Edge - P1-235B-A22B achieved 12 gold and 1 silver medal across 13 competitions, ranking it among the top models in the field [34][38]. - The lightweight model P1-30B-A3B also performed well, securing 8 gold, 4 silver, and 1 bronze medal, placing it third among open-source models [38].