Core Insights - The article discusses the challenges faced by multimodal large language models (MLLMs) in achieving robust reasoning capabilities, particularly in complex mathematical and geometric reasoning tasks [2] - A new systematic solution called "Formal Enhance Informal Reasoning" has been proposed by research teams from institutions like Shanghai Jiao Tong University and Fudan University, which aims to leverage formal logic to guide informal reasoning in models [2][3] Group 1: TrustGeoGen - TrustGeoGen is introduced as a formal-verified geometric data synthesis engine designed to address issues of data noise and logical consistency [4] - It integrates multimodal alignment, full-path formal verification, and exploration algorithms to create the GeoTrust dataset, ensuring that each data point is mathematically verified [4][13] - The engine consists of four modules: constructor, reasoner, sampler, and translator, which work together to construct problems and ensure logical coherence in reasoning [8] Group 2: GeoBench - GeoBench is a hierarchical diagnostic benchmark developed to accurately identify the reasoning shortcomings of models by breaking down geometric reasoning into four levels: visual perception, goal-oriented planning, rigorous theorem application, and self-reflective backtracking [16] - The benchmark utilizes 1,021 formal-verified samples generated by TrustGeoGen to evaluate models across six core tasks, revealing new insights into their reasoning capabilities [17][18] Group 3: SGVR - The Sub-Goal Verifiable Reward (SGVR) framework is proposed to address the limitations of result supervision by transforming abstract proofs into executable numerical sub-goals [20] - SGVR emphasizes the importance of milestones over outcomes, providing dense reward signals to encourage models to validate each step of their reasoning process [20][24] - Experiments show that SGVR significantly improves geometric reasoning capabilities and demonstrates strong cross-domain generalization, achieving performance increases in unseen tasks [24][26] Group 4: Future Directions - The research indicates that the systematic approach of combining trustworthy data synthesis, hierarchical capability diagnostics, and process supervision training can create a complete logical loop for enhancing reasoning models [30] - The team aims to extend the "formal enhancement" paradigm to broader fields such as general mathematics, code generation, and physical simulation, with the goal of building more trustworthy and robust general reasoning models [30]
从平面几何出发:形式化验证如何驱动MLLM的推理能力跃迁
机器之心·2026-01-20 10:19