斯坦福最新！大模型的幻觉分析：沉迷思考=真相消失？

Core Viewpoint - The paper explores the relationship between reasoning capabilities and hallucinations in multimodal reasoning models, questioning whether increased reasoning leads to decreased visual perception accuracy [2][3][37]. Group 1: Reasoning Models and Hallucinations - Multimodal reasoning models exhibit a tendency to amplify hallucinations as their reasoning capabilities improve, leading to potential misinterpretations of visual data [2][3][5]. - The study introduces a new metric, RH-AUC, to assess the balance between reasoning length and perception accuracy, indicating that longer reasoning chains may lead to increased hallucinations [4][30]. Group 2: Attention Mechanism and Performance - The attention mechanism in reasoning models shows a significant drop in focus on visual elements, leading to a reliance on language-based assumptions rather than visual evidence [5][18]. - Experiments reveal that reasoning models perform poorly on perception tasks compared to non-reasoning models, indicating that hallucination rates are higher in reasoning models regardless of their size [8][37]. Group 3: Training Paradigms and Data Quality - The paper identifies two main training paradigms: pure reinforcement learning (RL-only) and supervised fine-tuning combined with reinforcement learning (SFT+RL), with RL-only models generally performing better in balancing reasoning and perception [10][35]. - Data quality is emphasized over quantity, suggesting that models trained on high-quality, domain-specific data perform better in maintaining the reasoning-hallucination balance [39][42]. Group 4: Evaluation Metrics and Future Directions - The RH-Bench benchmark is introduced, consisting of 1000 multimodal tasks to evaluate models' reasoning and perception capabilities comprehensively [30][32]. - Future research directions include exploring broader model architectures and developing mechanisms for dynamically adjusting reasoning lengths to enhance model reliability [44].