Workflow
OpenAI的新论文,为什么被业内嘲讽是营销?
Hu Xiu·2025-09-12 09:16

Core Viewpoint - OpenAI's recent paper "Why Language Models Hallucinate" suggests that the hallucination in large models is primarily due to the training and evaluation mechanisms that reward guessing and penalize uncertainty, rather than a failure in model architecture [1][4]. Group 1: Evaluation Mechanisms - Current evaluation benchmarks encourage models to guess answers even when uncertain, leading to higher error rates despite slight improvements in accuracy [3][5]. - OpenAI advocates for a shift in evaluation criteria to penalize confident errors and reward appropriate expressions of uncertainty, moving away from a focus solely on accuracy [3][31]. Group 2: Training Data and Model Behavior - Large models typically only encounter positive examples during pre-training, which does not include instances of refusing to answer, resulting in a lack of learning this behavior [2]. - OpenAI's example shows that older models may have higher accuracy but also higher error rates due to fewer instances of "no answer" responses [3]. Group 3: Community Response and Criticism - The technical community has engaged in heated discussions regarding the paper, with some critics arguing that the research lacks novelty and depth [6][7]. - Concerns have been raised about the definition of hallucination, with various potential causes identified but not strictly defined [10]. Group 4: Implications for Future Models - If the proposed changes are adopted, the focus will shift from marginal accuracy improvements to the willingness to redefine evaluation and product rules to allow models to express uncertainty [5][34]. - The concept of a "low hallucination" model may lead to a more reliable natural language search engine, enhancing the safety and reliability of AI applications [20][21]. Group 5: Confidence and Calibration - The paper discusses the importance of confidence measures in model outputs, suggesting that models should express uncertainty to improve accuracy [28][30]. - OpenAI's approach emphasizes the need for social and technical measures to address the prevalent issue of penalizing uncertain answers [32][34]. Group 6: Strategic Positioning - OpenAI's advocacy for these changes may be linked to its strategic goals, including the promotion of GPT-5's low hallucination rate and its relevance to AI agents and enterprise applications [34][35].