华为新开源!扩散语言模型突破32K上下文,还解锁了「慢思考」
机器之心·2025-12-02 06:47

Core Insights - The article discusses the significant paradigm shift in text generation from Auto-Regressive models to Diffusion Language Models, highlighting the limitations of long sequence training and the recent advancements made by Huawei with the openPangu-R-7B-Diffusion model [1][14]. Model Performance - The openPangu-R-7B-Diffusion model achieved new state-of-the-art (SOTA) records in various benchmarks, demonstrating superior performance in general capabilities, mathematical reasoning, and code generation compared to other models [2][3]. - In the MMLU benchmark, openPangu-R-7B-Diffusion scored 81.66, surpassing LLaDA 2.0-mini-preview by 9.17 points [2]. - The model's performance in mathematical reasoning (MATH) reached 84.26, significantly leading over similar models [3]. Architectural Innovations - The model incorporates an innovative causal attention mask architecture, which allows for seamless migration from Auto-Regressive to BlockDiffusion models, addressing the architectural adaptation challenges [5][7]. - By retaining the causal attention characteristics, the model reduces adaptation costs and maximizes compatibility with pre-trained knowledge from Auto-Regressive models [8][10]. Training and Inference Efficiency - The training strategy of openPangu-R-7B-Diffusion optimizes the BlockDiffusion approach, enhancing the efficiency of the model [10]. - The model employs a dual-mode decoding capability, allowing users to balance generation quality and speed through different sampling settings [15]. Conclusion - The release of openPangu-R-7B-Diffusion marks a significant advancement in the ability of diffusion models to handle complex long texts, proving that they can achieve both speed and depth in processing [14].