Workflow
Context Scaling
icon
Search documents
邱锡鹏团队开源MOSS-TTSD!百万小时音频训练,突破AI播客恐怖谷
机器之心· 2025-07-05 05:53
Core Viewpoint - The article discusses the launch of MOSS-TTSD, a revolutionary text-to-speech model that significantly enhances the quality of dialogue synthesis, overcoming previous limitations in generating natural-sounding conversational audio [3][5]. Group 1: MOSS-TTSD Overview - MOSS-TTSD is developed through collaboration between Shanghai Chuangzhi Academy, Fudan University, and MoSi Intelligent, marking a significant advancement in AI podcasting technology [3]. - The model is open-source, allowing for unrestricted commercial applications, and is capable of generating high-quality dialogue audio from complete multi-speaker text [4][5]. Group 2: Technical Innovations - MOSS-TTSD is based on the Qwen3-1.7B-base model and trained on approximately 1 million hours of single-speaker and 400,000 hours of dialogue audio data, enabling bilingual speech synthesis [13]. - The core innovation lies in the XY-Tokenizer, which compresses bitrates to 1kbps while effectively modeling both semantic and acoustic information [15][16]. Group 3: Data Processing and Quality Assurance - The team implemented an efficient data processing pipeline to filter high-quality audio from vast datasets, utilizing an internal speaker separation model that outperforms existing solutions [24][27]. - The model achieved a Diarization Error Rate (DER) of 9.7 and 14.1 on various datasets, indicating superior performance in speaker separation tasks [29]. Group 4: Performance Evaluation - MOSS-TTSD was evaluated using a high-quality test set of approximately 500 bilingual dialogues, demonstrating significant improvements in speaker switching accuracy and voice similarity compared to baseline models [31][34]. - The model's prosody and naturalness were found to be far superior to those of competing models, showcasing its effectiveness in generating realistic dialogue [35].
复旦大学/上海创智学院邱锡鹏:Context Scaling,通往AGI的下一幕
机器之心· 2025-06-15 04:40
真正的智能在于理解任务的模糊与复杂,Context Scaling 是通向 AGI 的关键一步。 2024 年底,Ilya Sutskever 断言「我们所知的预训练时代即将终结」,让整个人工智能领域陷入对 Scaling What 的集体追问之中。 新的思路不断涌现:推理时扩展(Test-Time Scaling)让 OpenAI 的 o 系列在数学推理上大放异彩, DeepSeek-R1 通过 GRPO 替代 PPO 实现了强化学习的突破,强化学习 Self-play + LLM 让 AI 在游戏和代码 生成中展现惊人能力,Agent 化路径则催生了能够操作浏览器、调用工具的新一代智能助理…… 每一条路 都在探寻可能的下一个跃迁。 在这场技术探讨中,复旦大学 / 上海创智学院的邱锡鹏教授提出了一个耐人寻味的新路径 ——Context Scaling。与参数规模、数据量、推理计算等扩展路径不同,Context Scaling 的核心,不在于更大,而在于更 「深」:如何让 AI 真正理解并适应复杂、多变、模糊的情境(Context)。 在与机器之心的最新一次对谈中,邱锡鹏教授系统阐述了他对 AI 发展的洞察: ...