Workflow
We-Math 2.0:全新多模态数学推理数据集 × 首个综合数学知识体系
机器之心·2025-08-27 10:40

Core Viewpoint - The article discusses the development and features of We-Math 2.0, a versatile math reasoning system aimed at enhancing visual mathematical reasoning through a structured knowledge system and innovative training strategies [5][9][45]. Group 1: Knowledge System - We-Math 2.0 establishes a comprehensive knowledge system consisting of 5 levels, 491 knowledge points, and 1819 principles, covering mathematics from elementary to university levels [9][14]. - The knowledge system is designed to ensure clear hierarchical relationships and logical connections between mathematical concepts, with each knowledge point linked to several fundamental principles [14]. Group 2: Data Expansion Strategies - MathBook-Standard employs a bidirectional data expansion strategy, generating multiple visual variations for each problem and multiple questions for the same image to enhance model generalization [17][15]. - The approach aims to cover all 1819 mathematical principles by associating each problem with corresponding multi-level knowledge points [17]. Group 3: Difficulty Modeling - MathBook-Pro introduces a three-dimensional difficulty modeling for multi-modal math problems, expanding each seed problem into seven difficulty levels based on reasoning steps, visual complexity, and contextual complexity [20][21]. - This modeling supports dynamic scheduling and reinforcement learning training, providing a structured path from basic to advanced reasoning [27]. Group 4: Training Strategies - The training strategy includes a cold start with 1,000 carefully selected data points for supervised fine-tuning (SFT), followed by a two-phase reinforcement learning approach [23][30]. - The reinforcement learning focuses on average rewards based on the model's performance across similar knowledge principles, enhancing the model's reasoning capabilities [25][30]. Group 5: Evaluation and Results - MathBookEval, a comprehensive evaluation framework, consists of 1,000 samples designed to assess the model's knowledge and reasoning depth, utilizing high-quality, manually rendered image data [11][12]. - Experimental results indicate that MathBook-7B, developed from We-Math 2.0, shows significant performance improvements over baseline models, particularly in knowledge generalization and multi-step problem-solving [32][35].