上海AI Lab发布混合扩散语言模型SDAR:首个突破6600 tgs的开源扩散语言模型
机器之心·2025-11-01 04:22

Core Insights - The article introduces a new paradigm called SDAR (Synergistic Diffusion-AutoRegression) that addresses the slow inference speed and high costs associated with large model applications, which are primarily due to the serial nature of autoregressive (AR) models [2][3][4]. Group 1: SDAR Paradigm - SDAR effectively decouples training and inference, combining the high performance of AR models with the parallel inference advantages of diffusion models, allowing for low-cost transformation of any AR model into a parallel decoding model [4][11]. - Experimental results show that SDAR not only matches but often surpasses the performance of original AR models across multiple benchmarks, achieving up to a 12.3 percentage point advantage in complex scientific reasoning tasks [6][28]. Group 2: Performance and Efficiency - SDAR maintains the performance of AR models while significantly improving inference speed and reducing costs, demonstrating that larger models benefit more from parallelization without sacrificing performance [17][19]. - The research indicates that SDAR can be adapted to any mainstream AR model at a low cost, achieving comparable or superior performance in downstream tasks [19][29]. Group 3: Experimental Validation - The study conducted rigorous experiments to compare SDAR's performance with AR models, confirming that SDAR can achieve substantial speed improvements in real-world applications, with SDAR-8B-chat showing a 2.3 times acceleration over its AR counterpart [23][20]. - The results highlight that SDAR's unique generation mechanism does not compromise its complex reasoning capabilities, retaining long-chain reasoning abilities and excelling in tasks requiring understanding of structured information [28][29]. Group 4: Future Implications - SDAR represents a significant advancement in the field of large models, providing a powerful and flexible tool that lowers application barriers and opens new avenues for exploring higher performance and efficiency in AI reasoning paradigms [29][31].