扩散语言模型
Search documents
人民大学提出的扩散语言模型,可能要改写历史...
自动驾驶之心· 2025-12-12 03:02
Core Viewpoint - The article discusses the development and future of diffusion language models, highlighting two main phases: foundational research (2022-2024) and scaling (2024-2025) [3][14]. Phase 1: Foundational Research (2022-2024) - Diffusion language models are initially niche, with a focus on both continuous and discrete models [4][5]. - Continuous diffusion models have been applied to discrete data, with notable works including those by Percy Liang and Alex Graves [6]. - A significant method proposed at ICML 2024 unifies Bayesian flow networks and diffusion models without requiring data continuousization [7]. - Discrete diffusion models have evolved since their introduction in 2015, with modern iterations like D3PM and SEDD improving optimization loss functions [8]. - The relationship between MDM (Masked Diffusion Model) and BERT is explored, emphasizing the technical distinctions and the generative nature of diffusion models [11][12]. Phase 2: Scaling (2024-2025) - The research group aims to focus on MDM projects, ensuring each member has a significant contribution [15]. - The first scaling law for MDM is set to be presented at ICLR 2025, demonstrating that MDM can match autoregressive models in performance [16]. - The LLaDA model, capable of multi-turn dialogue, shows promising scalability and instruction-following abilities, comparable to LLaMA 3 [16]. - The industrial response to LLaDA includes rapid developments like Mercury coder and Gemini Diffusion, although these products are not directly influenced by the academic work [19]. - LLaDA is recognized as a significant contribution to the field, enhancing understanding of generative models despite criticisms regarding its novelty [21].
跳过“逐字生成”!蚂蚁集团赵俊博:扩散模型让我们能直接修改Token | MEET2026
量子位· 2025-12-12 03:00
Core Viewpoint - The article discusses the shift from autoregressive models to diffusion architecture in language models, highlighting the potential for faster generation speeds and lower computational costs with diffusion models [2][8]. Group 1: Diffusion Architecture Insights - Diffusion architecture allows for direct modification and control of tokens during inference, unlike autoregressive models that require re-generating entire segments [2][15]. - The recent release of LLaDA 2.0 marks a significant milestone, achieving a scale of 100 billion parameters for diffusion language models [4][44]. - The development of diffusion models is still in its early stages, but it has attracted attention from major companies like Google and ByteDance, as well as several startups [5][41]. Group 2: Technical Aspects and Comparisons - Diffusion models operate on a "fill-in-the-blank" mechanism rather than a sequential token generation, which can lead to more efficient data utilization [12][21]. - In terms of parameter efficiency, diffusion models can achieve similar performance with fewer parameters compared to autoregressive models under the same computational constraints [15][23]. - The unique characteristics of diffusion models allow for continuous training, unlike autoregressive models that plateau after several epochs [24][26]. Group 3: Future Directions and Community Engagement - The article emphasizes the need for further exploration of the scaling laws specific to diffusion language models, which differ from those of autoregressive models [56]. - The community is encouraged to participate in the development and optimization of diffusion models, as the ecosystem is still in its infancy [56]. - Upcoming collaborations and API releases are planned to enhance accessibility and integration of diffusion models into various applications [51].
华为新开源!扩散语言模型突破32K上下文,还解锁了「慢思考」
机器之心· 2025-12-02 06:47
Core Insights - The article discusses the significant paradigm shift in text generation from Auto-Regressive models to Diffusion Language Models, highlighting the limitations of long sequence training and the recent advancements made by Huawei with the openPangu-R-7B-Diffusion model [1][14]. Model Performance - The openPangu-R-7B-Diffusion model achieved new state-of-the-art (SOTA) records in various benchmarks, demonstrating superior performance in general capabilities, mathematical reasoning, and code generation compared to other models [2][3]. - In the MMLU benchmark, openPangu-R-7B-Diffusion scored 81.66, surpassing LLaDA 2.0-mini-preview by 9.17 points [2]. - The model's performance in mathematical reasoning (MATH) reached 84.26, significantly leading over similar models [3]. Architectural Innovations - The model incorporates an innovative causal attention mask architecture, which allows for seamless migration from Auto-Regressive to BlockDiffusion models, addressing the architectural adaptation challenges [5][7]. - By retaining the causal attention characteristics, the model reduces adaptation costs and maximizes compatibility with pre-trained knowledge from Auto-Regressive models [8][10]. Training and Inference Efficiency - The training strategy of openPangu-R-7B-Diffusion optimizes the BlockDiffusion approach, enhancing the efficiency of the model [10]. - The model employs a dual-mode decoding capability, allowing users to balance generation quality and speed through different sampling settings [15]. Conclusion - The release of openPangu-R-7B-Diffusion marks a significant advancement in the ability of diffusion models to handle complex long texts, proving that they can achieve both speed and depth in processing [14].
Lumina-DiMOO:多模态扩散语言模型重塑图像生成与理解
机器之心· 2025-11-16 04:01
Core Viewpoint - Lumina-DiMOO is an innovative multimodal generative language model that utilizes discrete diffusion modeling to bridge the gap between various multimodal tasks, enabling a seamless integration of text-to-image, image-to-image, and image-to-text capabilities [2][11]. Group 1: Historical Context - Traditional autoregressive models, such as Chameleon and Janus-Pro, face significant limitations including slow generation speed, constrained quality in high-resolution image generation, and a lack of seamless task integration [7]. Group 2: Current Innovations - Lumina-DiMOO employs a pure discrete diffusion framework, addressing the limitations of previous models by enhancing generation speed and quality through parallelized bidirectional attention mechanisms and flexible sampling strategies [9][11]. Group 3: Key Features - **Discrete Diffusion Architecture**: This architecture allows for efficient operation of image generation and understanding tasks within a single framework, breaking down traditional boundaries between generation and understanding [12]. - **Efficient Generation**: By processing multiple tokens simultaneously, Lumina-DiMOO accelerates inference and improves quality, ensuring effective collaboration between tasks [15]. - **Bidirectional Attention Mechanism**: This feature enhances the model's ability to understand contextual relationships in text and capture structural details in images, ensuring high consistency across multimodal tasks [17]. - **Joint Optimization**: The model utilizes a global optimization strategy during training, enhancing performance across various tasks and ensuring seamless transitions between them [18]. - **Max-Logit Caching Technology**: This innovation significantly boosts generation efficiency by caching stable tokens, reducing unnecessary computations and maintaining high-quality outputs, especially in high-resolution tasks [20]. Group 4: Advanced Learning Framework - **Self-GRPO Framework**: This new self-reinforcement framework integrates image generation and multimodal understanding into a single reinforcement learning trajectory, allowing the model to learn from its outputs and improve iteratively [22][23]. Group 5: Performance and Recognition - Lumina-DiMOO has achieved top rankings in several authoritative evaluations, demonstrating its superiority in semantic consistency, layout understanding, and reasoning capabilities compared to leading models like GPT-4o and Janus-Pro [29].
用更一致的轨迹、更少的解码步数「驯服」掩码扩散语言模型,扩散语言模型的推理性能和效率大幅提升
机器之心· 2025-11-05 04:15
Core Insights - The article discusses the rapid advancements in diffusion large language models (LLMs), highlighting their potential as strong competitors to traditional LLMs [2][7] - A recent paper from a collaborative research team proposes an efficient decoding strategy combined with reinforcement learning for masked diffusion large language models (MDLM), significantly improving their reasoning performance and efficiency [2][21] Group 1: Problem Identification - Masked diffusion large language models like LLaDA exhibit capabilities comparable to autoregressive models but face challenges with full diffusion-style decoding, which is less effective than block-wise decoding [7][9] - The decoding process of MDLMs often encounters an issue where early generation of <EOS> tokens leads to performance degradation, creating a decoding trap [14][15] Group 2: Proposed Solutions - The research team introduces an early rejection mechanism for <EOS> tokens to suppress their confidence during early decoding steps, thus preventing premature termination of generation [15] - A power-increasing decoding step scheduler is designed to optimize the decoding process, reducing the inference steps from O(L) to O(logL), thereby accelerating reasoning [15][16] Group 3: Consistency Trajectory Optimization - The team proposes a consistency trajectory grouping strategy (CJ-GRPO) to address inconsistencies between rollout and optimization trajectories, enhancing training stability and effectiveness [16] - By combining the early rejection mechanism, increasing step scheduler, and CJ-GRPO, the model can maintain performance comparable to baseline methods while significantly reducing decoding steps [16][24] Group 4: Experimental Results - Extensive experiments demonstrate that the proposed methods outperform baseline models in mathematical reasoning and planning tasks, with performance improvements of up to 2-4 times in certain benchmarks [23][24] - The results indicate that the combination of CJ-GRPO with EOSER and ASS maintains competitive performance in low-step inference scenarios, achieving a balance of speed and quality [24] Group 5: Future Directions - The article suggests exploring hybrid reasoning modes that combine the strengths of diffusion and autoregressive models to meet diverse task requirements [26]
从掩码生成到「再掩码」训练:RemeDi让扩散语言模型学会自我纠正与反思
机器之心· 2025-10-16 02:20
Core Insights - The article introduces RemeDi, a diffusion language model developed by the MAPLE lab at Westlake University, which incorporates a "remask" mechanism for self-reflection and optimization during text generation [2][26]. - RemeDi surpasses existing diffusion language models by identifying and correcting errors in generated text through a confidence score prediction system [8][27]. Group 1: Model Features - RemeDi is designed with a "remask" capability that allows it to identify incorrect tokens and correct them by leveraging context from subsequent generation steps [5][25]. - The model supports variable-length generation, breaking the limitation of fixed-length outputs in traditional diffusion models, enhancing flexibility in text generation [9][27]. - RemeDi employs a dual-stream architecture, where the Token Prediction Stream (TPS) predicts token distributions, and the Unmasking Policy Stream (UPS) outputs confidence scores for each token [10][8]. Group 2: Training Methodology - The training process consists of two phases: supervised fine-tuning (Remask SFT) and reinforcement learning (Remask RL) [12][17]. - During Remask SFT, the model learns to recover masked tokens while also identifying incorrect tokens that need to be remasked [13][12]. - The Remask RL phase optimizes the model's generation trajectory based on the results, enhancing the probability of generating correct final answers [17][20]. Group 3: Experimental Results - RemeDi demonstrates significant performance improvements in tasks such as mathematical reasoning, code generation, and general knowledge question answering compared to other diffusion language models [22][27]. - The model's performance is further enhanced when combining Remask SFT with Remask RL, leading to superior results across various benchmarks [22][24].
推理速度10倍提升,蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer
机器之心· 2025-10-13 09:24
Core Insights - Ant Group has launched dInfer, the industry's first high-performance inference framework for diffusion large language models (dLLM), achieving over 10 times the inference speed compared to Fast-dLLM [2][29] - dInfer has set a new milestone in performance, reaching a throughput of 1011 tokens per second in single-batch inference scenarios, surpassing highly optimized autoregressive (AR) models [29] Group 1: dInfer Framework - dInfer is designed to support various dLLM architectures, including LLaDA, LLaDA-MoE, and LLaDA-MoE-TD, emphasizing modularity and scalability [9][20] - The framework integrates four core modules: Model, KV Cache Manager, Iteration Manager, and Decoder, allowing developers to customize and optimize strategies [11][13] - dInfer addresses three core challenges in dLLM inference: high computational costs, KV cache invalidation, and the complexities of parallel decoding [12][19] Group 2: Performance Enhancements - dInfer employs a "Vicinity KV-Cache Refresh" strategy to reduce computational costs while maintaining generation quality by selectively recalculating KV caches [15][17] - The framework optimizes the forward computation speed of dLLM to match that of AR models through various system enhancements [18] - It introduces hierarchical and credit decoding algorithms to maximize the number of tokens decoded in parallel without additional training [19][20] Group 3: Performance Metrics - In tests with 8 NVIDIA H800 GPUs, dInfer achieved an average inference speed of 681 tokens per second, which is 10.7 times faster than Fast-dLLM [29] - When combined with trajectory distillation technology, dInfer's average inference speed soared to 847 tokens per second, exceeding the performance of AR models by over 3 times [24][29] - dInfer's performance in code generation tasks has set a record, demonstrating significant speed advantages in latency-sensitive scenarios [29] Group 4: Open Source and Community Engagement - The release of dInfer marks a significant step in the practical efficiency of diffusion language models, inviting global developers and researchers to collaborate in building a more efficient and open AI ecosystem [28][25] - The complete code, technical reports, and experimental configurations for dInfer v0.1 have been made open source [27][28]
推理性能提升10倍 蚂蚁集团开源高性能扩散语言模型推理框架dInfer
Huan Qiu Wang· 2025-10-13 09:03
Core Insights - Ant Group has officially announced the open-source release of dInfer, the industry's first high-performance inference framework for diffusion language models [1][5] - dInfer demonstrates a significant improvement in inference speed, achieving a 10.7 times increase compared to NVIDIA's Fast-dLLM framework, and reaching a speed of 1011 tokens per second in the HumanEval code generation task [1][4] - The framework addresses key challenges in diffusion language model inference, including high computational costs, KV cache failures, and parallel decoding [1][2] Summary by Sections - **Performance Metrics** - dInfer achieves an average inference speed of 681 tokens per second, compared to 63.6 tokens per second for Fast-dLLM, marking a 10.7 times improvement [4] - When compared to the AR model Qwen2.5-3B, dInfer's average inference speed is 2.5 times faster, at 681 tokens per second versus 277 tokens per second [5] - **Technical Architecture** - dInfer is designed with a modular architecture that includes four core components: Model, KV-Cache Manager, Iteration Manager, and Decoder, allowing developers to customize and optimize their configurations [2] - Each module integrates targeted solutions to overcome the three main challenges faced by diffusion language models [2] - **Industry Impact** - The launch of dInfer signifies a critical step in transitioning diffusion language models from theoretical feasibility to practical efficiency, connecting cutting-edge research with industrial applications [5] - Ant Group invites global developers and researchers to explore the potential of diffusion language models, aiming to build a more efficient and open AI ecosystem [5]
首次超越自回归模型!蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer
Xin Lang Ke Ji· 2025-10-13 09:00
Core Insights - Ant Group has officially open-sourced the industry's first high-performance diffusion language model inference framework, dInfer, which significantly enhances the efficiency of diffusion language models [1][2] Performance Metrics - dInfer achieves a 10.7 times improvement in inference speed compared to NVIDIA's Fast-dLLM framework, with average transactions per second (TPS) increasing from 63.6 to 681 [1] - In the HumanEval code generation task, dInfer reaches a speed of 1011 tokens per second in single-batch inference, surpassing autoregressive models for the first time in the open-source community [1] - When compared to the vLLM framework running the Qwen2.5-3B model, dInfer's average inference speed is 2.5 times faster, with 681 TPS versus 277 TPS [1] Industry Impact - The launch of dInfer marks a critical step in transitioning diffusion language models from theoretical feasibility to practical efficiency, connecting cutting-edge research with industrial application [2] - Ant Group invites global developers and researchers to explore the vast potential of diffusion language models, aiming to build a more efficient and open AI ecosystem [2]
扩散语言模型也有MoE版本了!蚂蚁&人大从头训练LLaDA-MoE,即将完全开源
机器之心· 2025-09-12 11:31
Core Viewpoint - The article discusses the development of the LLaDA-MoE model, the first native MoE architecture diffusion language model trained from scratch, which demonstrates significant performance and efficiency advantages over traditional autoregressive models [2][15][18]. Group 1: Model Development and Performance - The LLaDA-MoE model was trained on 20 terabytes of data and features 1.4 billion active parameters, achieving performance comparable to denser autoregressive models like Qwen2.5-3B while maintaining faster inference speeds [15][17][29]. - The LLaDA series has rapidly evolved, with LLaDA-MoE being a notable milestone, surpassing previous models like LLaDA1.0/1.5 and Dream-7B in various benchmark tests [13][18][29]. - The model's architecture allows for significant scaling potential, with plans to explore higher sparsity ratios and larger MoE diffusion language models [29][40]. Group 2: Technical Innovations and Advantages - The diffusion model approach allows for parallel decoding, bidirectional modeling, and iterative correction, addressing limitations of autoregressive models such as serial bottlenecks and lack of error correction capabilities [38][40]. - Evidence suggests that diffusion language models can achieve better learning outcomes than autoregressive models, particularly in scenarios with limited data, demonstrating a data utilization efficiency that can exceed three times that of autoregressive models [40][41]. - The training framework and infrastructure developed by Ant Group, including the ATorch framework, supports the efficient training of large-scale MoE models [25][26]. Group 3: Strategic Vision and Future Directions - The development of LLaDA-MoE reflects a strategic choice to explore high-potential areas in AI, moving beyond established paths to enhance the limits of intelligence [44][47]. - Ant Group's commitment to innovation is evident in its previous projects and ongoing research in areas like dynamic MoE architectures and hybrid linear architectures, all aimed at achieving general artificial intelligence (AGI) [45][46][47].