置信度
Search documents
ICLR 2026 | 清华提出交叉熵分解:“误差熵”才是大模型规模定律真正的驱动项
机器之心· 2026-03-21 05:04
Core Insights - The article discusses the failure of the cross-entropy scaling law in large models, revealing that only a hidden component, error-entropy, truly scales with model size, while self-alignment and confidence do not [2][6][25]. Group 1: Scaling Law and Cross-Entropy - The scaling law has been a guiding principle in the development of large language models, with a consensus that cross-entropy loss decreases predictably as model parameters increase [2]. - Recent findings indicate that the cross-entropy scaling law fails for ultra-large models, as the loss reduction deviates from the expected power-law prediction [2][25]. - A new decomposition method for cross-entropy has been proposed, breaking it down into three components: error-entropy, self-alignment, and confidence [3][6]. Group 2: Components of Cross-Entropy - Error-entropy is the only component that strictly follows power-law scaling, while self-alignment and confidence show little to no change with increasing model size [3][21]. - The study introduces a rank-based error (RBE) metric, which measures the ranking position of the correct token in model outputs, providing a more robust indicator of model performance than probability scores [6][8]. - The training dynamics reveal that error-entropy decreases first, followed by improvements in self-alignment and confidence, indicating a clear optimization sequence during training [10][11]. Group 3: Implications of Findings - The research suggests that as model size increases, the proportion of error-entropy in total loss decreases, while the contributions from non-scaling components increase, leading to the observed failure of the scaling law [25]. - This finding implies that using error-entropy as a training signal or evaluation metric may more accurately reflect model capability improvements, guiding more efficient training strategies [27]. - The study emphasizes that the growth in model size primarily enhances ranking ability rather than probability calibration, providing new insights into the optimization of large models [27].
人生,就是活在两个悬崖之间
Hu Xiu· 2025-09-24 00:01
Group 1 - The article emphasizes the importance of understanding life as a range rather than a fixed point, suggesting that individuals should focus on defining their life intervals with a midpoint and a radius to manage uncertainty [3][4][12] - It discusses the concept of midpoint (c) representing the core strategy or average, and radius (r) indicating the tolerance for fluctuations or risk [5][7] - The article highlights that a larger radius allows for more experimentation but also brings individuals closer to potential pitfalls [8][10] Group 2 - It illustrates the difference between precise predictions and broader ranges, using the analogy of fishing with a net versus shooting at a target, where broader ranges can lead to more successful outcomes [30][32][34] - The article critiques the common tendency to focus on narrow predictions, which can lead to failure, while advocating for a mindset that embraces uncertainty and broader possibilities [38][41][44] Group 3 - The article introduces the concept of confidence intervals and confidence levels, explaining that a confidence interval is a range where outcomes are expected to fall, while confidence level indicates the reliability of that range [56][59] - It emphasizes the need for individuals to adopt a dual confidence approach: high confidence in qualitative assessments and lower confidence in quantitative predictions [66][92] Group 4 - The article references Warren Buffett's investment criteria, which include setting initial boundaries for company valuations and ensuring high certainty in qualitative assessments while maintaining humility in quantitative growth predictions [67][68][80] - It discusses the importance of understanding the difference between qualitative certainty and quantitative predictions, highlighting the need for a balance between the two in decision-making [97][100] Group 5 - The article concludes with practical applications of the discussed concepts, encouraging individuals to establish a solid qualitative foundation while remaining humble about quantitative goals [106][113] - It warns against the dangers of leveraging and overconfidence, advocating for a focus on maintaining a buffer zone to withstand uncertainties in life [135][140][146]
OpenAI的新论文,为什么被业内嘲讽是营销?
Hu Xiu· 2025-09-12 09:16
Core Viewpoint - OpenAI's recent paper "Why Language Models Hallucinate" suggests that the hallucination in large models is primarily due to the training and evaluation mechanisms that reward guessing and penalize uncertainty, rather than a failure in model architecture [1][4]. Group 1: Evaluation Mechanisms - Current evaluation benchmarks encourage models to guess answers even when uncertain, leading to higher error rates despite slight improvements in accuracy [3][5]. - OpenAI advocates for a shift in evaluation criteria to penalize confident errors and reward appropriate expressions of uncertainty, moving away from a focus solely on accuracy [3][31]. Group 2: Training Data and Model Behavior - Large models typically only encounter positive examples during pre-training, which does not include instances of refusing to answer, resulting in a lack of learning this behavior [2]. - OpenAI's example shows that older models may have higher accuracy but also higher error rates due to fewer instances of "no answer" responses [3]. Group 3: Community Response and Criticism - The technical community has engaged in heated discussions regarding the paper, with some critics arguing that the research lacks novelty and depth [6][7]. - Concerns have been raised about the definition of hallucination, with various potential causes identified but not strictly defined [10]. Group 4: Implications for Future Models - If the proposed changes are adopted, the focus will shift from marginal accuracy improvements to the willingness to redefine evaluation and product rules to allow models to express uncertainty [5][34]. - The concept of a "low hallucination" model may lead to a more reliable natural language search engine, enhancing the safety and reliability of AI applications [20][21]. Group 5: Confidence and Calibration - The paper discusses the importance of confidence measures in model outputs, suggesting that models should express uncertainty to improve accuracy [28][30]. - OpenAI's approach emphasizes the need for social and technical measures to address the prevalent issue of penalizing uncertain answers [32][34]. Group 6: Strategic Positioning - OpenAI's advocacy for these changes may be linked to its strategic goals, including the promotion of GPT-5's low hallucination rate and its relevance to AI agents and enterprise applications [34][35].