Core Insights - The article discusses the importance of dialects in preserving cultural diversity and highlights the challenges faced in dialect text-to-speech (TTS) technology, which remains a "gray area" in the industry [2][4] - DiaMoe-TTS is introduced as an open-source solution that aims to provide a comprehensive framework for dialect TTS, enabling researchers and developers to utilize and improve upon it [4][30] - The framework is designed to be low-cost and low-barrier, allowing for the synthesis of various dialects without the need for extensive data [31] Summary by Sections Introduction to DiaMoe-TTS - DiaMoe-TTS is a collaborative project between Giant Network AI Lab and Tsinghua University's SATLab, aimed at creating a dialect TTS model that rivals industrial-grade systems [2][4] - The framework utilizes a unified IPA (International Phonetic Alphabet) representation to address inconsistencies in dialect modeling [13][27] Technical Features - The system incorporates a dialect-aware Mixture-of-Experts (MoE) architecture, which allows different expert networks to focus on specific dialect features, enhancing the preservation of dialect characteristics [15][16] - A parameter-efficient fine-tuning (PEFT) strategy is implemented to adapt the model to low-resource dialects, requiring minimal adjustments to existing parameters [19][22] Training Methodology - The training process is divided into multiple stages, including IPA transfer initialization and joint training across multiple dialects, which improves model performance and adaptability [21][23] - The use of data augmentation techniques, such as pitch and speed perturbation, ensures the model can generate natural-sounding dialect speech even with limited data [20][22] Performance Results - DiaMoe-TTS demonstrates competitive performance metrics, achieving close to industrial-level results in Cantonese with a Word Error Rate (WER) of 76.59% and Mean Opinion Score (MOS) improvements across various dialects [25][26][27] - The framework's ability to support a wide range of dialects, including those with minimal data, presents new opportunities for dialect preservation and cultural transmission [25][30] Future Prospects - The research team plans to expand the dialect and minority language datasets, refine the IPA alignment and data preprocessing processes, and explore more efficient low-resource modeling methods [33] - The goal is to facilitate global participation in dialect and minority language research, ensuring that these languages are not forgotten in the digital age [33]
清华&巨人网络首创MoE多方言TTS框架,数据代码方法全开源
机器之心·2025-10-15 04:08