Workflow
OpenAI新幻觉论文惹争议!GPT-5拉胯是测试基准有问题??
量子位·2025-09-08 09:00

Core Viewpoint - OpenAI's recent paper discusses the phenomenon of "hallucination" in language models, attributing it to the current training and evaluation processes that favor guessing over admitting uncertainty [2][14]. Group 1: Definition and Implications of Hallucination - OpenAI defines hallucination as the generation of seemingly reasonable but incorrect answers by language models [8]. - The paper highlights that even simple questions can lead to hallucinations, where models provide multiple incorrect answers confidently [10][11]. - The current evaluation methods incentivize models to guess rather than express uncertainty, leading to a culture of "confidently wrong" responses [15][17]. Group 2: Evaluation Metrics and Model Performance - The paper suggests that the evaluation metrics should be redesigned to penalize guessing more than abstaining from answering, and to reward appropriate expressions of uncertainty [22]. - GPT-5, according to the paper, has a higher abstention rate (52%) compared to previous models, indicating it is less likely to guess [6]. - The performance of GPT-5 is framed as a consequence of the flawed evaluation metrics rather than a failure of the model itself [6][19]. Group 3: Community Reactions and Philosophical Discussions - The paper has sparked discussions on whether all outputs from large language models are hallucinations and the implications of this for understanding model capabilities [26][27]. - Some argue that the nature of language and its relationship to truth complicates the notion of hallucination, as language does not equate to truth [34][35]. - The limitations of statistical models are also highlighted, suggesting that errors in predictions are expected outcomes rather than failures [39][40]. Group 4: Practical Applications and Concerns - There are concerns about the practical implications of models that might choose to say "I don't know" instead of providing a plausible answer, which could affect user experience [45]. - The discussion includes the potential utility of hallucinations in creative writing, where a degree of fictional output is acceptable [43][44]. - The balance between providing confident answers and admitting uncertainty remains a critical point of debate among users and developers [46].