Workflow
数据潜力
icon
Search documents
扩散语言模型的潜力被严重低估了!新国立发现可全面超越自回归
自动驾驶之心· 2025-11-15 16:04
点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 本文只做学术分享,如有侵权,联系删文 当我们还在讨论"大模型靠堆数据变聪明"的时代,另一场无声的革命正在发生。 新加坡国立大学联合 Sea AI Lab等研究团队发现,在数据成为瓶颈的未来, 扩散语言模型(Diffusion Language Models, DLM) 展现出惊人的学习 潜力。 在相同规模与算力下,它们不靠更多数据,却能学得更快、更深、更"聪明"。 这篇名为《Diffusion Language Models are Super Data Learners》的最新论文揭示: DLM 在有限数据条件下,全面超越传统的自回归语言模型 (AR) ,预示着语言模型的新范式正在崛起。 在数据有限的情况下,哪种模型能够从每个独特的token中提取更多的信息?换句话说, 数据(而非计算)成为了限制因素。 何为智能交叉点? 论文链接:https://arxiv.org/abs/2511.03276v1 研究背景 自回归语言模型目前是现代大规模语言模型的主流,其优点是高效的训练和推理性能,适用于大规模数据集。然而,随着计算资源 ...
token危机解决?扩散模型数据潜力3倍于自回归,重训480次性能仍攀升
机器之心· 2025-08-10 04:31
Core Viewpoint - The article discusses the advancements in diffusion language models (DLMs) as superior data learners compared to autoregressive (AR) models, particularly in data-constrained environments [1][8]. Group 1: Token Crisis and Research Findings - The research addresses the impending token crisis in large language models (LLMs), where the availability of high-quality training text data is diminishing, limiting model performance [2][3]. - The team pre-trained DLMs and AR models from scratch, achieving a maximum scale of 8 billion parameters and 480 billion tokens [3][4]. Group 2: Performance Comparison - In scenarios with limited tokens, DLMs outperform AR models, demonstrating over three times the data potential [5][8]. - A DLM trained on 1 billion tokens achieved 56% accuracy on the HellaSwag benchmark and 33% on the MMLU benchmark, significantly surpassing AR models [14]. Group 3: Repeated Training Benefits - Repeated training on the same dataset enhances performance, with DLMs showing no signs of performance saturation even after extensive training [14][19]. - The study indicates that DLMs can extract more effective information from a fixed dataset, leading to improved performance metrics [14][19]. Group 4: Mechanisms Behind DLMs' Superiority - DLMs utilize a bidirectional modeling approach, allowing them to extract more information from web data compared to purely causal modeling used by AR models [19][22]. - DLMs are described as "super dense models," translating their computational density into enhanced intelligence [22][24]. Group 5: Methodological Critique of Related Research - The article critiques a concurrent study, highlighting methodological flaws that may skew its conclusions regarding DLMs and AR models [25][30]. - It emphasizes that the loss function used in the other study does not accurately represent model likelihood, potentially leading to misleading results [26][32].