Workflow
再掩码
icon
Search documents
从掩码生成到「再掩码」训练:RemeDi让扩散语言模型学会自我纠正与反思
机器之心· 2025-10-16 02:20
Core Insights - The article introduces RemeDi, a diffusion language model developed by the MAPLE lab at Westlake University, which incorporates a "remask" mechanism for self-reflection and optimization during text generation [2][26]. - RemeDi surpasses existing diffusion language models by identifying and correcting errors in generated text through a confidence score prediction system [8][27]. Group 1: Model Features - RemeDi is designed with a "remask" capability that allows it to identify incorrect tokens and correct them by leveraging context from subsequent generation steps [5][25]. - The model supports variable-length generation, breaking the limitation of fixed-length outputs in traditional diffusion models, enhancing flexibility in text generation [9][27]. - RemeDi employs a dual-stream architecture, where the Token Prediction Stream (TPS) predicts token distributions, and the Unmasking Policy Stream (UPS) outputs confidence scores for each token [10][8]. Group 2: Training Methodology - The training process consists of two phases: supervised fine-tuning (Remask SFT) and reinforcement learning (Remask RL) [12][17]. - During Remask SFT, the model learns to recover masked tokens while also identifying incorrect tokens that need to be remasked [13][12]. - The Remask RL phase optimizes the model's generation trajectory based on the results, enhancing the probability of generating correct final answers [17][20]. Group 3: Experimental Results - RemeDi demonstrates significant performance improvements in tasks such as mathematical reasoning, code generation, and general knowledge question answering compared to other diffusion language models [22][27]. - The model's performance is further enhanced when combining Remask SFT with Remask RL, leading to superior results across various benchmarks [22][24].