Workflow
方言语音合成
icon
Search documents
氪星晚报|微信公关总监:朋友圈不会有访客功能,也不太可能出二次编辑功能;百度搜索发布行业首个开放式实时互动数字人智能体;官方通报“蔡国强烟花秀”事件调查核查情况
3 6 Ke· 2025-10-15 11:29
Group 1 - The Minister of Industry and Information Technology of China, Li Lecheng, met with Apple CEO Tim Cook to discuss Apple's business development in China and cooperation in the electronic information sector [1] - Li emphasized China's vast market potential and commitment to high-level opening-up, aiming to create a favorable business environment for foreign companies like Apple [1] - Apple is encouraged to deepen its engagement in the Chinese market and participate in China's new industrialization process [1] Group 2 - Rakuten is reportedly considering an IPO for its credit card business in the United States, with discussions still in early stages [3] - The collaboration between SenseTime and Cambricon aims to optimize hardware and software jointly, focusing on building a domestic AI infrastructure and expanding vertical business [4] - Bharti Airtel announced a strategic partnership with Google to establish India's first large-scale AI hub, with Google planning to invest approximately $15 billion from 2026 to 2030 [5] Group 3 - Baidu Search upgraded its AI capabilities, launching an open real-time interactive digital human agent and supporting various AI content creation modalities [5] - The new electric vehicle by JD.com, in collaboration with GAC Aion, is expected to be priced between 100,000 to 120,000 yuan and will be launched during the Double Eleven shopping festival [6] - Kailera Therapeutics completed a $600 million Series B financing round led by Bain Capital to advance obesity treatment drug development [6] Group 4 - The People's Bank of China reported an increase of 22.71 trillion yuan in RMB deposits in the first three quarters, with household deposits rising by 12.73 trillion yuan [7] - The National Development and Reform Commission plans to establish 28 million charging facilities by the end of 2027, aiming to double the charging service capacity [8] - The Philippines will resume its electronic visa program for Chinese citizens starting in November, facilitating travel and trade cooperation [9]
清华&巨人网络首创MoE多方言TTS框架,数据代码方法全开源
机器之心· 2025-10-15 04:08
Core Insights - The article discusses the importance of dialects in preserving cultural diversity and highlights the challenges faced in dialect text-to-speech (TTS) technology, which remains a "gray area" in the industry [2][4] - DiaMoe-TTS is introduced as an open-source solution that aims to provide a comprehensive framework for dialect TTS, enabling researchers and developers to utilize and improve upon it [4][30] - The framework is designed to be low-cost and low-barrier, allowing for the synthesis of various dialects without the need for extensive data [31] Summary by Sections Introduction to DiaMoe-TTS - DiaMoe-TTS is a collaborative project between Giant Network AI Lab and Tsinghua University's SATLab, aimed at creating a dialect TTS model that rivals industrial-grade systems [2][4] - The framework utilizes a unified IPA (International Phonetic Alphabet) representation to address inconsistencies in dialect modeling [13][27] Technical Features - The system incorporates a dialect-aware Mixture-of-Experts (MoE) architecture, which allows different expert networks to focus on specific dialect features, enhancing the preservation of dialect characteristics [15][16] - A parameter-efficient fine-tuning (PEFT) strategy is implemented to adapt the model to low-resource dialects, requiring minimal adjustments to existing parameters [19][22] Training Methodology - The training process is divided into multiple stages, including IPA transfer initialization and joint training across multiple dialects, which improves model performance and adaptability [21][23] - The use of data augmentation techniques, such as pitch and speed perturbation, ensures the model can generate natural-sounding dialect speech even with limited data [20][22] Performance Results - DiaMoe-TTS demonstrates competitive performance metrics, achieving close to industrial-level results in Cantonese with a Word Error Rate (WER) of 76.59% and Mean Opinion Score (MOS) improvements across various dialects [25][26][27] - The framework's ability to support a wide range of dialects, including those with minimal data, presents new opportunities for dialect preservation and cultural transmission [25][30] Future Prospects - The research team plans to expand the dialect and minority language datasets, refine the IPA alignment and data preprocessing processes, and explore more efficient low-resource modeling methods [33] - The goal is to facilitate global participation in dialect and minority language research, ensuring that these languages are not forgotten in the digital age [33]