字节Seed发布扩散语言模型，推理速度达2146 tokens/s，比同规模自回归快5.4倍

Core Viewpoint - ByteDance's Seed Diffusion Preview introduces a new diffusion language model focused on code generation, utilizing discrete state diffusion technology to enhance inference speed and flexibility in code editing tasks [1][5]. Technical Innovations - The model achieves a code inference speed of 2146 tokens/s on H20, outperforming similar models like Mercury and Gemini Diffusion, and is 5.4 times faster than autoregressive models [3][25]. - Seed Diffusion Preview employs a two-stage training strategy to address the limitations of autoregressive models, focusing on both context completion and global coherence in code generation [8][10]. Two-Stage Training - The first stage involves masked diffusion training, where tokens in the original sequence are replaced with a [MASK] token to help the model learn to recover original tokens from partially masked sequences, constituting 80% of the training steps [11][12]. - The second stage focuses on editing diffusion training, which enhances the model's understanding of global logic by introducing operations like insertion, deletion, and replacement, leading to a 4.8% improvement in code repair tasks compared to autoregressive models [14][15]. Structured Code Generation - To mitigate logical confusion in code generation, the model incorporates structured priors, ensuring that it adheres to inherent coding rules, such as variable declaration before use [17][19]. - The model learns correct code generation sequences through extensive pre-training, allowing it to generate code in a structured manner [18][19]. Efficiency Optimization - The model utilizes on-policy learning, where it generates code and simultaneously updates its parameters based on the current strategy, enhancing training efficiency [21]. - Block-level parallel diffusion sampling is implemented to balance computational resources and generation latency, allowing for parallel processing of code blocks rather than token-by-token generation [23]. Performance Validation - Experimental results demonstrate significant improvements in inference speed, competitive generation quality, and the effectiveness of key technologies, with the model achieving 2146 tokens per second while maintaining high-quality code generation [25][26].