扩散语言模型
Search documents
跳过“逐字生成”,蚂蚁集团赵俊博:扩散模型让我们能直接修改Token
3 6 Ke· 2025-12-12 07:17
Core Insights - The main focus of the news is on the emerging diffusion architecture for language models, which offers advantages over traditional autoregressive models in terms of speed and computational efficiency [1][4][20]. Group 1: Diffusion Architecture Advantages - Diffusion architecture allows for direct modification and control of tokens during inference, eliminating the need to regenerate entire segments of content as required by autoregressive models [1][5]. - The newly released LLaDA 2.0 model has achieved a scale of 100 billion parameters, marking a significant milestone in the development of diffusion language models [1][20]. - Diffusion models are described as "data-hungry," requiring larger datasets for training compared to autoregressive models, but they can absorb data more quickly [5][8]. Group 2: Technical Developments - The LLaDA model employs a "fill-in-the-blank" prediction method, which contrasts with the sequential token generation of autoregressive models [6][8]. - The architecture includes both global and causal attention mechanisms to enhance computational efficiency and maintain coherence in generated sequences [16]. - The research team has made significant strides in addressing architectural challenges, including the integration of mixture of experts (MoE) within the diffusion framework [19]. Group 3: Industry Impact and Future Directions - Major tech companies, including Google and ByteDance, are actively exploring diffusion models, indicating a growing interest in this technology [1][19]. - The development of a new inference engine, dInfer, is expected to enhance the performance of diffusion models, with potential for significant speed improvements in key applications [24][25]. - The community is encouraged to collaborate in building the ecosystem around diffusion language models, which are still in the early stages of development [27].
人民大学提出的扩散语言模型,可能要改写历史...
自动驾驶之心· 2025-12-12 03:02
作者 | 李崇轩 编辑 | 自动驾驶之心 原文链接: https://www.zhihu.com/question/1908479621466396378/answer/1910672718174589774 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 大家好,我是中国人民大学高瓴人工智能学院李崇轩,因为做的非常相关,来回答一下这个问题。 我在连续扩散模型和朱军老师以及师弟师妹们有很多合作,代表性工作有 Analytic-DPM,U-ViT, DPM-Solver,ProlificDreamer,DPM-Solver++,unidiffuser 等 等。 我在人大的课题组很年轻,组内在离散扩散模型的代表性工作有 RADD,Scaling Law for MDM,LLaDA,LLaDA-V 和这两天即将发布的 LLaDA 1.5。 我想可以按照时间划分为两个阶段来介绍一下这个领域,然后发表一下我的看法。 第一阶段:2022-2024年底,扩散语言模型偏基础研究的阶段 ...
跳过“逐字生成”!蚂蚁集团赵俊博:扩散模型让我们能直接修改Token | MEET2026
量子位· 2025-12-12 03:00
Core Viewpoint - The article discusses the shift from autoregressive models to diffusion architecture in language models, highlighting the potential for faster generation speeds and lower computational costs with diffusion models [2][8]. Group 1: Diffusion Architecture Insights - Diffusion architecture allows for direct modification and control of tokens during inference, unlike autoregressive models that require re-generating entire segments [2][15]. - The recent release of LLaDA 2.0 marks a significant milestone, achieving a scale of 100 billion parameters for diffusion language models [4][44]. - The development of diffusion models is still in its early stages, but it has attracted attention from major companies like Google and ByteDance, as well as several startups [5][41]. Group 2: Technical Aspects and Comparisons - Diffusion models operate on a "fill-in-the-blank" mechanism rather than a sequential token generation, which can lead to more efficient data utilization [12][21]. - In terms of parameter efficiency, diffusion models can achieve similar performance with fewer parameters compared to autoregressive models under the same computational constraints [15][23]. - The unique characteristics of diffusion models allow for continuous training, unlike autoregressive models that plateau after several epochs [24][26]. Group 3: Future Directions and Community Engagement - The article emphasizes the need for further exploration of the scaling laws specific to diffusion language models, which differ from those of autoregressive models [56]. - The community is encouraged to participate in the development and optimization of diffusion models, as the ecosystem is still in its infancy [56]. - Upcoming collaborations and API releases are planned to enhance accessibility and integration of diffusion models into various applications [51].
华为新开源!扩散语言模型突破32K上下文,还解锁了「慢思考」
机器之心· 2025-12-02 06:47
Core Insights - The article discusses the significant paradigm shift in text generation from Auto-Regressive models to Diffusion Language Models, highlighting the limitations of long sequence training and the recent advancements made by Huawei with the openPangu-R-7B-Diffusion model [1][14]. Model Performance - The openPangu-R-7B-Diffusion model achieved new state-of-the-art (SOTA) records in various benchmarks, demonstrating superior performance in general capabilities, mathematical reasoning, and code generation compared to other models [2][3]. - In the MMLU benchmark, openPangu-R-7B-Diffusion scored 81.66, surpassing LLaDA 2.0-mini-preview by 9.17 points [2]. - The model's performance in mathematical reasoning (MATH) reached 84.26, significantly leading over similar models [3]. Architectural Innovations - The model incorporates an innovative causal attention mask architecture, which allows for seamless migration from Auto-Regressive to BlockDiffusion models, addressing the architectural adaptation challenges [5][7]. - By retaining the causal attention characteristics, the model reduces adaptation costs and maximizes compatibility with pre-trained knowledge from Auto-Regressive models [8][10]. Training and Inference Efficiency - The training strategy of openPangu-R-7B-Diffusion optimizes the BlockDiffusion approach, enhancing the efficiency of the model [10]. - The model employs a dual-mode decoding capability, allowing users to balance generation quality and speed through different sampling settings [15]. Conclusion - The release of openPangu-R-7B-Diffusion marks a significant advancement in the ability of diffusion models to handle complex long texts, proving that they can achieve both speed and depth in processing [14].
Lumina-DiMOO:多模态扩散语言模型重塑图像生成与理解
机器之心· 2025-11-16 04:01
Core Viewpoint - Lumina-DiMOO is an innovative multimodal generative language model that utilizes discrete diffusion modeling to bridge the gap between various multimodal tasks, enabling a seamless integration of text-to-image, image-to-image, and image-to-text capabilities [2][11]. Group 1: Historical Context - Traditional autoregressive models, such as Chameleon and Janus-Pro, face significant limitations including slow generation speed, constrained quality in high-resolution image generation, and a lack of seamless task integration [7]. Group 2: Current Innovations - Lumina-DiMOO employs a pure discrete diffusion framework, addressing the limitations of previous models by enhancing generation speed and quality through parallelized bidirectional attention mechanisms and flexible sampling strategies [9][11]. Group 3: Key Features - **Discrete Diffusion Architecture**: This architecture allows for efficient operation of image generation and understanding tasks within a single framework, breaking down traditional boundaries between generation and understanding [12]. - **Efficient Generation**: By processing multiple tokens simultaneously, Lumina-DiMOO accelerates inference and improves quality, ensuring effective collaboration between tasks [15]. - **Bidirectional Attention Mechanism**: This feature enhances the model's ability to understand contextual relationships in text and capture structural details in images, ensuring high consistency across multimodal tasks [17]. - **Joint Optimization**: The model utilizes a global optimization strategy during training, enhancing performance across various tasks and ensuring seamless transitions between them [18]. - **Max-Logit Caching Technology**: This innovation significantly boosts generation efficiency by caching stable tokens, reducing unnecessary computations and maintaining high-quality outputs, especially in high-resolution tasks [20]. Group 4: Advanced Learning Framework - **Self-GRPO Framework**: This new self-reinforcement framework integrates image generation and multimodal understanding into a single reinforcement learning trajectory, allowing the model to learn from its outputs and improve iteratively [22][23]. Group 5: Performance and Recognition - Lumina-DiMOO has achieved top rankings in several authoritative evaluations, demonstrating its superiority in semantic consistency, layout understanding, and reasoning capabilities compared to leading models like GPT-4o and Janus-Pro [29].
用更一致的轨迹、更少的解码步数「驯服」掩码扩散语言模型,扩散语言模型的推理性能和效率大幅提升
机器之心· 2025-11-05 04:15
Core Insights - The article discusses the rapid advancements in diffusion large language models (LLMs), highlighting their potential as strong competitors to traditional LLMs [2][7] - A recent paper from a collaborative research team proposes an efficient decoding strategy combined with reinforcement learning for masked diffusion large language models (MDLM), significantly improving their reasoning performance and efficiency [2][21] Group 1: Problem Identification - Masked diffusion large language models like LLaDA exhibit capabilities comparable to autoregressive models but face challenges with full diffusion-style decoding, which is less effective than block-wise decoding [7][9] - The decoding process of MDLMs often encounters an issue where early generation of <EOS> tokens leads to performance degradation, creating a decoding trap [14][15] Group 2: Proposed Solutions - The research team introduces an early rejection mechanism for <EOS> tokens to suppress their confidence during early decoding steps, thus preventing premature termination of generation [15] - A power-increasing decoding step scheduler is designed to optimize the decoding process, reducing the inference steps from O(L) to O(logL), thereby accelerating reasoning [15][16] Group 3: Consistency Trajectory Optimization - The team proposes a consistency trajectory grouping strategy (CJ-GRPO) to address inconsistencies between rollout and optimization trajectories, enhancing training stability and effectiveness [16] - By combining the early rejection mechanism, increasing step scheduler, and CJ-GRPO, the model can maintain performance comparable to baseline methods while significantly reducing decoding steps [16][24] Group 4: Experimental Results - Extensive experiments demonstrate that the proposed methods outperform baseline models in mathematical reasoning and planning tasks, with performance improvements of up to 2-4 times in certain benchmarks [23][24] - The results indicate that the combination of CJ-GRPO with EOSER and ASS maintains competitive performance in low-step inference scenarios, achieving a balance of speed and quality [24] Group 5: Future Directions - The article suggests exploring hybrid reasoning modes that combine the strengths of diffusion and autoregressive models to meet diverse task requirements [26]
从掩码生成到「再掩码」训练:RemeDi让扩散语言模型学会自我纠正与反思
机器之心· 2025-10-16 02:20
Core Insights - The article introduces RemeDi, a diffusion language model developed by the MAPLE lab at Westlake University, which incorporates a "remask" mechanism for self-reflection and optimization during text generation [2][26]. - RemeDi surpasses existing diffusion language models by identifying and correcting errors in generated text through a confidence score prediction system [8][27]. Group 1: Model Features - RemeDi is designed with a "remask" capability that allows it to identify incorrect tokens and correct them by leveraging context from subsequent generation steps [5][25]. - The model supports variable-length generation, breaking the limitation of fixed-length outputs in traditional diffusion models, enhancing flexibility in text generation [9][27]. - RemeDi employs a dual-stream architecture, where the Token Prediction Stream (TPS) predicts token distributions, and the Unmasking Policy Stream (UPS) outputs confidence scores for each token [10][8]. Group 2: Training Methodology - The training process consists of two phases: supervised fine-tuning (Remask SFT) and reinforcement learning (Remask RL) [12][17]. - During Remask SFT, the model learns to recover masked tokens while also identifying incorrect tokens that need to be remasked [13][12]. - The Remask RL phase optimizes the model's generation trajectory based on the results, enhancing the probability of generating correct final answers [17][20]. Group 3: Experimental Results - RemeDi demonstrates significant performance improvements in tasks such as mathematical reasoning, code generation, and general knowledge question answering compared to other diffusion language models [22][27]. - The model's performance is further enhanced when combining Remask SFT with Remask RL, leading to superior results across various benchmarks [22][24].
推理速度10倍提升,蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer
机器之心· 2025-10-13 09:24
Core Insights - Ant Group has launched dInfer, the industry's first high-performance inference framework for diffusion large language models (dLLM), achieving over 10 times the inference speed compared to Fast-dLLM [2][29] - dInfer has set a new milestone in performance, reaching a throughput of 1011 tokens per second in single-batch inference scenarios, surpassing highly optimized autoregressive (AR) models [29] Group 1: dInfer Framework - dInfer is designed to support various dLLM architectures, including LLaDA, LLaDA-MoE, and LLaDA-MoE-TD, emphasizing modularity and scalability [9][20] - The framework integrates four core modules: Model, KV Cache Manager, Iteration Manager, and Decoder, allowing developers to customize and optimize strategies [11][13] - dInfer addresses three core challenges in dLLM inference: high computational costs, KV cache invalidation, and the complexities of parallel decoding [12][19] Group 2: Performance Enhancements - dInfer employs a "Vicinity KV-Cache Refresh" strategy to reduce computational costs while maintaining generation quality by selectively recalculating KV caches [15][17] - The framework optimizes the forward computation speed of dLLM to match that of AR models through various system enhancements [18] - It introduces hierarchical and credit decoding algorithms to maximize the number of tokens decoded in parallel without additional training [19][20] Group 3: Performance Metrics - In tests with 8 NVIDIA H800 GPUs, dInfer achieved an average inference speed of 681 tokens per second, which is 10.7 times faster than Fast-dLLM [29] - When combined with trajectory distillation technology, dInfer's average inference speed soared to 847 tokens per second, exceeding the performance of AR models by over 3 times [24][29] - dInfer's performance in code generation tasks has set a record, demonstrating significant speed advantages in latency-sensitive scenarios [29] Group 4: Open Source and Community Engagement - The release of dInfer marks a significant step in the practical efficiency of diffusion language models, inviting global developers and researchers to collaborate in building a more efficient and open AI ecosystem [28][25] - The complete code, technical reports, and experimental configurations for dInfer v0.1 have been made open source [27][28]
推理性能提升10倍 蚂蚁集团开源高性能扩散语言模型推理框架dInfer
Huan Qiu Wang· 2025-10-13 09:03
Core Insights - Ant Group has officially announced the open-source release of dInfer, the industry's first high-performance inference framework for diffusion language models [1][5] - dInfer demonstrates a significant improvement in inference speed, achieving a 10.7 times increase compared to NVIDIA's Fast-dLLM framework, and reaching a speed of 1011 tokens per second in the HumanEval code generation task [1][4] - The framework addresses key challenges in diffusion language model inference, including high computational costs, KV cache failures, and parallel decoding [1][2] Summary by Sections - **Performance Metrics** - dInfer achieves an average inference speed of 681 tokens per second, compared to 63.6 tokens per second for Fast-dLLM, marking a 10.7 times improvement [4] - When compared to the AR model Qwen2.5-3B, dInfer's average inference speed is 2.5 times faster, at 681 tokens per second versus 277 tokens per second [5] - **Technical Architecture** - dInfer is designed with a modular architecture that includes four core components: Model, KV-Cache Manager, Iteration Manager, and Decoder, allowing developers to customize and optimize their configurations [2] - Each module integrates targeted solutions to overcome the three main challenges faced by diffusion language models [2] - **Industry Impact** - The launch of dInfer signifies a critical step in transitioning diffusion language models from theoretical feasibility to practical efficiency, connecting cutting-edge research with industrial applications [5] - Ant Group invites global developers and researchers to explore the potential of diffusion language models, aiming to build a more efficient and open AI ecosystem [5]
首次超越自回归模型!蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer
Xin Lang Ke Ji· 2025-10-13 09:00
Core Insights - Ant Group has officially open-sourced the industry's first high-performance diffusion language model inference framework, dInfer, which significantly enhances the efficiency of diffusion language models [1][2] Performance Metrics - dInfer achieves a 10.7 times improvement in inference speed compared to NVIDIA's Fast-dLLM framework, with average transactions per second (TPS) increasing from 63.6 to 681 [1] - In the HumanEval code generation task, dInfer reaches a speed of 1011 tokens per second in single-batch inference, surpassing autoregressive models for the first time in the open-source community [1] - When compared to the vLLM framework running the Qwen2.5-3B model, dInfer's average inference speed is 2.5 times faster, with 681 TPS versus 277 TPS [1] Industry Impact - The launch of dInfer marks a critical step in transitioning diffusion language models from theoretical feasibility to practical efficiency, connecting cutting-edge research with industrial application [2] - Ant Group invites global developers and researchers to explore the vast potential of diffusion language models, aiming to build a more efficient and open AI ecosystem [2]