Workflow
AFTER模型
icon
Search documents
让大模型基于「图像事实」说话:用事实文本+自适应编辑,让语言偏见无处遁形丨ICLR'26
量子位· 2026-03-26 07:34
Core Insights - The article discusses the challenges of object hallucination in large visual language models (LVLM), where models may generate incorrect or non-existent objects based on language bias rather than visual evidence [4][6] - A new framework called AFTER (Adaptive Factual-guided Visual-Textual Editing for hallucination mitigation) is introduced, which aims to reduce hallucinations while maintaining low inference costs [6][19] Group 1: AFTER Framework - AFTER consists of two main modules: Factual-Augmented Activation Steering (FAS) and Query-Adaptive Offset Optimization (QAO) [9][10] - FAS extracts factual information from ground-truth annotations to create a reliable textual description that guides the model's activation editing [9][10] - QAO adapts the editing process based on the specific question asked, allowing for more precise adjustments to the model's output [10][11] Group 2: Experimental Results - The AFTER framework significantly outperforms existing methods in reducing hallucinations while incurring minimal additional inference costs [12][15] - In various evaluations, AFTER achieved an average increase of +130.7 in overall performance metrics across three LVLMs, indicating enhanced visual alignment and reliability [15][19] - The model operates efficiently at a speed of 29.7 tokens/s with moderate memory usage of approximately 16.3GB [17][19] Group 3: Implications and Future Directions - AFTER provides a practical approach to mitigating hallucinations without the need for retraining or fine-tuning the main model, making deployment more manageable [19][20] - The framework explicitly addresses language bias through factual semantics, offering a more direct solution compared to traditional visual perturbation methods [19] - Future developments may focus on enhancing domain-specific visual perception and bias mitigation, particularly in specialized fields like healthcare [19]