Workflow
Nature公开谷歌IMO金牌模型技术细节!核心团队仅10人,一年给AI编出8000万道数学题训练
创业邦·2025-11-14 10:24

Core Insights - Google DeepMind has publicly released the complete technology and training methods behind its new model, AlphaProof, which is designed for mathematical proofs [2][4] - The model utilizes a 3 billion parameter encoder-decoder transformer architecture and incorporates a reinforcement learning environment based on the Lean theorem prover [8][7] Development Process - The AlphaProof team was relatively small, with around 10 core members, and was led by IMO gold medalist Miklós Horváth, who developed a method for creating various problem variants for training [4][5] - Over the course of a year, the team explored various research ideas, integrating successful approaches into the AlphaProof system [5] Training Methodology - AlphaProof transforms the mathematical proof process into a game-like environment, where each mathematical proposition serves as a new game level [7] - The model was pre-trained on approximately 300 billion tokens of code and mathematical text, followed by fine-tuning with around 300,000 manually crafted proofs from the Mathlib library [9][10] - A significant breakthrough was achieved through an automated formalization process that generated about 80 million formalized problems from 1 million natural language math questions [10] Performance at IMO 2024 - AlphaProof demonstrated impressive performance at the 2024 IMO, solving three problems, including the most difficult one, P6, which only 5 out of 609 participants solved completely [15][16] - The model utilized a testing-time reinforcement learning mechanism to generate around 400,000 related problem variants for particularly challenging questions [13][15] Future Directions - Following its success, DeepMind has opened access to AlphaProof for researchers, allowing them to explore its capabilities [19] - While AlphaProof excels in identifying counterexamples and formalizing statements, it faces challenges with custom definitions and relies heavily on the Lean theorem prover [20] - The model's dependency on Lean's evolving environment and the limited availability of unique mathematical problems present ongoing challenges for its broader applicability [20]