解构AI“幻觉，OpenAI发布《大语言模型为何会产生幻觉》研究报告

Core Viewpoint - The report from OpenAI highlights that the phenomenon of "hallucination" in large language models (LLMs) is fundamentally rooted in their training and evaluation mechanisms, which reward guessing behavior rather than expressing uncertainty [3][9]. Group 1: Origin of Hallucination - Hallucination seeds are planted during the pre-training phase, where models learn from vast text corpora, leading to implicit judgments on the validity of generated text [4]. - The probability of generating erroneous text is directly linked to the model's performance in a binary classification task that assesses whether a text segment is factually correct or fabricated [4][5]. - Models are likely to fabricate answers for "arbitrary facts" that appear infrequently in training data, with hallucination rates correlating to the frequency of these facts in the dataset [5]. Group 2: Solidification of Hallucination - The current evaluation systems in AI exacerbate the hallucination issue, as most benchmarks use a binary scoring system that penalizes uncertainty [6][7]. - This scoring mechanism creates an environment akin to "exam-oriented education," where models are incentivized to guess rather than admit uncertainty, leading to a phenomenon termed "the epidemic of punishing uncertainty" [7]. Group 3: Proposed Solutions - The authors advocate for a "socio-technical" transformation to address the hallucination problem, emphasizing the need to revise the prevailing evaluation benchmarks that misalign incentives [8]. - A specific recommendation is to introduce "explicit confidence targets" in mainstream evaluations, guiding models to respond only when they have a high level of certainty [8]. - This approach aims to encourage models to adjust their behavior based on their internal confidence levels, promoting the development of more trustworthy AI systems [8][9].