Context Scaling

Search documents
邱锡鹏团队开源MOSS-TTSD!百万小时音频训练,突破AI播客恐怖谷
机器之心· 2025-07-05 05:53
Core Viewpoint - The article discusses the launch of MOSS-TTSD, a revolutionary text-to-speech model that significantly enhances the quality of dialogue synthesis, overcoming previous limitations in generating natural-sounding conversational audio [3][5]. Group 1: MOSS-TTSD Overview - MOSS-TTSD is developed through collaboration between Shanghai Chuangzhi Academy, Fudan University, and MoSi Intelligent, marking a significant advancement in AI podcasting technology [3]. - The model is open-source, allowing for unrestricted commercial applications, and is capable of generating high-quality dialogue audio from complete multi-speaker text [4][5]. Group 2: Technical Innovations - MOSS-TTSD is based on the Qwen3-1.7B-base model and trained on approximately 1 million hours of single-speaker and 400,000 hours of dialogue audio data, enabling bilingual speech synthesis [13]. - The core innovation lies in the XY-Tokenizer, which compresses bitrates to 1kbps while effectively modeling both semantic and acoustic information [15][16]. Group 3: Data Processing and Quality Assurance - The team implemented an efficient data processing pipeline to filter high-quality audio from vast datasets, utilizing an internal speaker separation model that outperforms existing solutions [24][27]. - The model achieved a Diarization Error Rate (DER) of 9.7 and 14.1 on various datasets, indicating superior performance in speaker separation tasks [29]. Group 4: Performance Evaluation - MOSS-TTSD was evaluated using a high-quality test set of approximately 500 bilingual dialogues, demonstrating significant improvements in speaker switching accuracy and voice similarity compared to baseline models [31][34]. - The model's prosody and naturalness were found to be far superior to those of competing models, showcasing its effectiveness in generating realistic dialogue [35].
复旦大学/上海创智学院邱锡鹏:Context Scaling,通往AGI的下一幕
机器之心· 2025-06-15 04:40
Core Viewpoint - The article discusses the concept of Context Scaling as a crucial step towards achieving Artificial General Intelligence (AGI), emphasizing the need for AI to understand and adapt to complex and ambiguous contexts rather than merely increasing model size or data volume [2][21]. Summary by Sections Evolution of Large Models - The evolution of large models is summarized in three acts: 1. The first act focuses on the success of model scaling, where data and parameters are stacked to compress knowledge, leading to the emergence of models like ChatGPT and MOSS [6]. 2. The second act involves post-training optimization, enhancing decision-making capabilities through methods like reinforcement learning and multi-modal approaches, exemplified by models such as GPT o1/o3 and DeepSeek-R1 [6][7]. 3. The third act, Context Scaling, aims to address the challenges of defining context to improve model capabilities, particularly in complex and nuanced situations [8][21]. Context Scaling - Context Scaling is defined as the ability of AI to understand and adapt to rich, complex, and dynamic contextual information, which is essential for making reasonable judgments in ambiguous scenarios [8][9]. - The concept of "tacit knowledge" is introduced, referring to the implicit understanding that humans possess but is difficult to articulate, which AI must learn to capture [11][12]. Three Technical Pillars - Context Scaling is supported by three key capabilities: 1. Strong Interactivity: AI must learn from interactions, understanding social cues and cultural nuances [14][15]. 2. Embodiment: AI needs a sense of agency to perceive and act within its environment, which can be tested in virtual settings [16]. 3. Anthropomorphizing: AI should resonate emotionally with humans, understanding complex social interactions and cultural sensitivities [17]. Challenges and Integration - The article highlights that Context Scaling is not a replacement for existing scaling methods but rather complements them by focusing on the quality and structure of input data [18]. - It also redefines the environment for reinforcement learning, moving beyond simple state-action-reward loops to include rich contextual information [20]. Conclusion - The exploration of Context Scaling aims to unify various technological paths under the core goal of contextual understanding, which is seen as essential for navigating the complexities of the real world and a potential key to achieving AGI [22].