复杂空间推理新SOTA,性能提升55%,中山大学新作SpatialDreamer
3 6 Ke·2025-12-22 10:12

Core Insights - SpatialDreamer, developed by institutions including Sun Yat-sen University, significantly enhances performance in complex spatial tasks through active mental imagery and spatial reasoning [1][4]. Group 1: Model Development - SpatialDreamer addresses limitations of existing models in perspective transformation tasks by simulating human-like active exploration and reasoning [1][4]. - The model transitions from passive observation to active goal-directed imagination, allowing it to autonomously decide what to observe and how to reason in a 3D environment [4]. Group 2: Methodology - The closed-loop reasoning process of SpatialDreamer consists of three steps: exploration, imagination, and reasoning [4]. - GeoPO, a strategy optimization method, combines tree sampling and geometric consistency constraints to enhance model performance and accelerate training convergence [4]. Group 3: Dataset and Learning - The SpatialDreamer-SFT dataset includes single-pass reasoning and reflective reasoning data, promoting a "think-imagine-answer" learning pattern [6]. Group 4: Experimental Results - SpatialDreamer achieved state-of-the-art (SOTA) accuracy of 93.9% and 92.5% on real and synthetic images in the SAT benchmark [7]. - It improved overall accuracy to 84.9% on the MindCube-Tiny benchmark, surpassing the baseline Qwen2.5-VL-7B by over 55% [7]. - In the VSI-Bench, it led in tasks such as object counting and path planning with an average accuracy of 62.2% [7].