通用的dLLM开发框架，让BERT掌握扩散式对话

Core Insights - The article discusses the development of a diffusion language model (DLM) that enhances the capabilities of the traditional BERT model, demonstrating that a lightweight instruction fine-tuning approach can significantly improve BERT's generative abilities without extensive pre-training [2][18]. Group 1: DLM Framework and Implementation - The dLLM framework was developed to support BERT Chat, emphasizing ease of use and reproducibility, making it suitable for beginners to understand the key steps in diffusion language modeling [6][3]. - The team has open-sourced the entire training, inference, and evaluation code, providing a "Hello World" example for easy replication and understanding of the diffusion language model [3][6]. Group 2: Model Selection and Training - ModernBERT was chosen as the base model due to its extended context length of 8,192 tokens and superior performance on non-generative benchmarks, which was confirmed through experiments [8][12]. - The experiments revealed that additional generative pre-training on ModernBERT did not significantly improve performance, indicating that the original masked language model (MLM) pre-training already encoded sufficient language knowledge [10][11]. Group 3: Performance Evaluation - The ModernBERT-base-chat-v0 (0.1B) and ModernBERT-large-chat-v0 (0.4B) models demonstrated stable performance across various evaluation tasks, with the larger model approaching the performance of Qwen1.5-0.5B [12][14]. - The results showed that even with a smaller model size, the diffusion training approach remains competitive, highlighting the potential of BERT in generating coherent dialogue [12][14]. Group 4: Educational Focus - The BERT Chat series is positioned as a teaching and research experiment rather than a commercial system, aimed at helping researchers understand the mechanisms of diffusion language models [16][18]. - The team emphasizes transparency in the research process by sharing complete training scripts, training curves, and experimental details, fostering a comprehensive understanding of the diffusion language model research path [16][18].