蚂蚁、中国人民大学发布行业首个原生MoE扩散语言模型

Core Viewpoint - Ant Group and Renmin University of China have jointly developed the native MoE architecture diffusion language model (dLLM) LLaDA-MoE, demonstrating the scalability and stability of industrial-grade large-scale training on approximately 20 terabytes of data [1] Group 1 - The LLaDA-MoE model has been trained from scratch using the MoE architecture [1] - The model will be fully open-sourced in the near future [1]