被动推理

Search documents
ICML 2025 | 大模型能在信息不完备的情况下问出正确的问题吗?
机器之心· 2025-07-24 04:08
Core Insights - The article emphasizes the importance of Active Reasoning (AR) in enhancing the capabilities of Large Language Models (LLMs) beyond Passive Reasoning (PR) [1][2][3][4][7][10][55] - It introduces AR-Bench, a benchmark designed to evaluate the active reasoning capabilities of LLMs in real-world scenarios [7][19][55] Group 1: Active Reasoning - Active Reasoning (AR) is defined as the ability of models to actively seek out information through questioning and interaction, contrasting with Passive Reasoning (PR) which relies on complete information [3][4][15][18] - The need for AR is highlighted in various practical applications, such as medical diagnosis and detective work, where information is often incomplete [3][14][15] - The article identifies the core challenge of AR as the necessity to ask the right questions to gather critical information [4][18] Group 2: AR-Bench - AR-Bench is introduced as a systematic tool for assessing LLMs' active reasoning capabilities, simulating real-world information-gathering scenarios [19][20][55] - It consists of three task types: Situation Puzzles (SP), Guessing Numbers (GN), and Dynamic Conversations (DC), each testing different reasoning abilities [21][22][25] - The evaluation framework includes both result assessment and process assessment, focusing on the quality of questions posed and the effectiveness of information retrieval [25] Group 3: Findings on LLM Performance - Current LLMs, including advanced models like GPT-4o, show significant deficiencies in active reasoning, achieving only 35% accuracy in GN tasks [28][34] - The article notes that even state-of-the-art active reasoning methods do not improve model performance on AR-Bench [33] - Human performance in active reasoning tasks significantly surpasses that of existing LLMs, indicating a gap in model capabilities [34][55] Group 4: Recommendations for Future Work - The article suggests several directions for enhancing active reasoning capabilities, including the collection of high-quality fine-tuning datasets and the development of more reliable validation methods for search approaches [56][60] - It emphasizes the need for further research to enable LLMs to ask effective questions and solve real-world problems [55][60]