人民大学提出的扩散语言模型,可能要改写历史...
自动驾驶之心·2025-12-12 03:02

Core Viewpoint - The article discusses the development and future of diffusion language models, highlighting two main phases: foundational research (2022-2024) and scaling (2024-2025) [3][14]. Phase 1: Foundational Research (2022-2024) - Diffusion language models are initially niche, with a focus on both continuous and discrete models [4][5]. - Continuous diffusion models have been applied to discrete data, with notable works including those by Percy Liang and Alex Graves [6]. - A significant method proposed at ICML 2024 unifies Bayesian flow networks and diffusion models without requiring data continuousization [7]. - Discrete diffusion models have evolved since their introduction in 2015, with modern iterations like D3PM and SEDD improving optimization loss functions [8]. - The relationship between MDM (Masked Diffusion Model) and BERT is explored, emphasizing the technical distinctions and the generative nature of diffusion models [11][12]. Phase 2: Scaling (2024-2025) - The research group aims to focus on MDM projects, ensuring each member has a significant contribution [15]. - The first scaling law for MDM is set to be presented at ICLR 2025, demonstrating that MDM can match autoregressive models in performance [16]. - The LLaDA model, capable of multi-turn dialogue, shows promising scalability and instruction-following abilities, comparable to LLaMA 3 [16]. - The industrial response to LLaDA includes rapid developments like Mercury coder and Gemini Diffusion, although these products are not directly influenced by the academic work [19]. - LLaDA is recognized as a significant contribution to the field, enhancing understanding of generative models despite criticisms regarding its novelty [21].

人民大学提出的扩散语言模型,可能要改写历史... - Reportify