Workflow
细粒度视觉推理
icon
Search documents
RewardMap: 通过多阶段强化学习解决细粒度视觉推理的Sparse Reward
机器之心· 2025-10-21 03:43
Core Insights - The article discusses the development of RewardMap, a multi-stage reinforcement learning framework designed to enhance the fine-grained visual reasoning capabilities of multi-modal large language models (MLLMs) in complex scenarios like high-resolution subway maps [3][9][17]. Group 1: Problem Identification - Recent advancements in large language models (LLMs) and multi-modal large models (MLLMs) have raised questions about their ability to interpret complex visual information, particularly in high-resolution and densely structured environments [3]. - The initial work, ReasonMap, revealed that even state-of-the-art MLLMs frequently make errors in path planning, such as misreading lines, missing stations, and repeating routes [3][12]. Group 2: Proposed Solution - The team introduced RewardMap, which employs a multi-stage reinforcement learning framework that incorporates fine-grained rewards and a curriculum-based training approach to improve MLLMs' visual understanding and spatial reasoning [3][10]. - RewardMap breaks down complex route planning tasks into smaller, assessable sub-goals, allowing for a more nuanced feedback mechanism rather than a binary correct/incorrect signal [10][11]. Group 3: Implementation Details - RewardMap is built on the foundation of ReasonMap and includes a dataset covering 30 cities with 4,018 problem samples, categorized into five types to provide detailed supervision during the reinforcement learning phase [6][12]. - The framework's reward function consists of three components: format compliance, final correctness, and detail, with difficulty weights applied to reflect the true complexity of the tasks [11][12]. Group 4: Performance Results - RewardMap demonstrated consistent performance improvements across various benchmarks, achieving a maximum increase of 13.51% in the SpatialEval metric compared to traditional methods [13][14]. - Qualitative comparisons showed that models trained with RewardMap exhibited fewer visual confusions and hallucinations, providing more accurate route information [14][15]. Group 5: Future Outlook - The value of RewardMap extends beyond performance metrics, offering a reusable reinforcement learning paradigm for high-resolution visual tasks by systematically breaking down complex problems into measurable sub-goals [17]. - The framework's effectiveness in enhancing the general capabilities of multi-modal large models has been validated, indicating that real-world data like maps will play a significant role in future developments [18].
多模态模型挑战北京杭州地铁图!o3成绩显著,但跟人类有差距
量子位· 2025-06-07 05:02
ReasonMap团队 投稿 量子位 | 公众号 QbitAI 近年来,大语言模型(LLMs)以及多模态大模型(MLLMs)在多种场景理解和复杂推理任务中取得突破性进展。 然而,一个关键问题仍然值得追问: 多模态大模型(MLLMs),真的能"看懂图"了吗? 特别是在面对结构复杂、细节密集的图像时,它们是否具备细粒度视觉理解与空间推理能力,比如挑战一下高清 地铁图 这种。 为此,来自西湖大学、新加坡国立大学、浙江大学、华中科技大学的团队提出了一个全新的评测基准 ReasonMap 。 看得出来北京、杭州的地铁图难倒了一大片模型。 这是首个聚焦于 高分辨率交通图(主要为地铁图)的多模态推理评测基准,专为评估大模型在理解图像中细粒度的结构化空间信息 方面的 能力而设计。 结果发现,当前主流开源的多模态模型在ReasonMap上面临明显性能瓶颈,尤其在 跨线路路径规划 上常出现视觉混淆或站点遗漏。 而经强化学习后训练的闭源推理模型(如 GPT-o3)在多个维度上 显著优于 现有开源模型,但与人类水平相比仍存在明显差距。 在面对不同国家地区的地铁图中,四个代表性 MLLM(Qwen2.5-VL-72B-I(蓝色)、 I ...