VLM解几何题总翻车？GEODPO从「看」入手：用结构化表示+DPO优化，让模型先看懂再推理丨ICLR'26

Core Viewpoint - The research highlights that the failure of visual language models (VLMs) in geometric problems is primarily due to perceptual errors rather than reasoning difficulties, which has not been systematically analyzed in existing studies [3][4]. Group 1: Geometric Perception Issues - VLMs exhibit significant performance drops when dealing with geometric shapes, indicating a shortcoming in their geometric perception capabilities [2][3]. - Common issues include misidentifying basic geometric elements (points, lines, circles), failing to detect key structural relationships (collinearity, perpendicularity, tangency), and grounding errors in images [10][11]. Group 2: GEOPERCEIVE Framework - The research team introduced GEOPERCEIVE, the first independent evaluation framework focused on geometric perception capabilities, which allows for a clearer analysis of model performance by separating perceptual errors from reasoning errors [9][25]. - GEOPERCEIVE assesses models at a granular level, evaluating the accuracy of each geometric element and structural relationship, thus pinpointing specific capability bottlenecks [16][25]. Group 3: GEODPO Optimization Method - GEODPO, a Translator-Guided Reinforcement Learning method, was proposed to optimize model performance by using structured rewards based on geometric matching scores, enhancing stability and interpretability [19][26]. - The method demonstrated improved geometric perception capabilities and better out-of-distribution generalization, indicating its effectiveness in addressing distribution shifts [21][26]. Group 4: Implications and Future Directions - The findings suggest that geometric perception is a crucial factor influencing geometric reasoning performance, and structured reinforcement learning offers a stable optimization path [26]. - The research paradigm established through this work can be extended to other complex tasks, emphasizing the importance of structured representation and computable reward functions in model training [28][29].