生成式方法 - filings, earnings calls, financial reports, news

生成式方法

Search documents

机器之心· 2025-12-23 04:15

编辑｜张倩在国内，懂技术 —— 尤其是 AI 技术的年轻人，真的不缺崭露头角的机会。前段时间，2025 年腾讯广告算法大赛结果揭晓，前 10 名队伍的全部成员都拿到了腾讯的录用意向书，冠军还拿到了 200 万元巨额奖金。当时，看完选手们的答辩，腾讯公司副总裁蒋杰感慨地说，这届年轻人的知识储备令人惊叹，他们做出来的东西和工业界的实际工作非常接近，没有代差。如果说大赛考的是一个已经被工业界解决的问题，选手们查查论文、复现方案，拼拼工程把问题解决掉倒也不是什么新鲜事。但看过今年赛题的人都知道，这次摆在桌面上的，是一个仍在探索中的真实难题，没有现成答案，也不存在所谓「最优解」。在业界，目前主要有两种方法在 PK。一种是已经用了很多年的判别式方法，另一种是最近两三年兴起的生成式方法。要理解两种方法的差异，我们可以举个例子：假设你是一个新来的班主任，想要根据小明同学的兴趣给他推荐合适的课外书。也正因如此，比赛真正精彩的部分，其实不在排名本身，而在于：这道题究竟难在哪里？工业界已经做了些什么？而这些年轻人，又给出了哪些实用的解法？在这篇文章中，我们将结合冠亚军团队的解决方案，来详细聊聊这些问题。 ...

TENCENT(HK:00700)

Artificial Intelligence

Artificial Intelligence

语音分离最全综述来了！清华等团队深度分析200+文章，系统解析「鸡尾酒会问题」研究

机器之心· 2025-09-03 04:33

Core Viewpoint - The article discusses the revolutionary advancements in the field of speech separation, particularly addressing the "cocktail party problem" through the development of deep neural networks (DNN) [2]. Group 1: Overview of Speech Separation - Speech separation has become crucial for enhancing speech clarity in complex acoustic environments and serves as a preprocessing method for other speech processing tasks [2]. - Researchers from various institutions conducted a comprehensive survey of over 200 representative papers, analyzing the latest research methods across multiple dimensions including deep learning methods, model architectures, evaluation metrics, datasets, and future challenges [2]. Group 2: Problem Definition - The authors categorize speech separation tasks into known and unknown speaker separation based on whether the number of speakers is fixed or variable, highlighting the challenges associated with each scenario [6]. - The need for dynamic output channel determination and the balance between separation quality and termination timing are emphasized as significant challenges in unknown speaker scenarios [6]. Group 3: Learning Paradigms - The article compares supervised and unsupervised learning methods, detailing the advantages and limitations of each approach in the context of speech separation [10]. - Supervised learning is currently the most mature paradigm, utilizing paired mixed audio and clean source audio for training, while unsupervised methods explore training models directly on unlabelled mixed audio [12]. Group 4: Model Architectures - The core components and evolution of speech separation models are summarized, including encoder, separation network, and decoder [14]. - Various architectures such as RNN-based, CNN-based, and transformer models are discussed, showcasing their strengths in capturing long-term dependencies and local feature extraction [17][18]. Group 5: Evaluation Metrics - A comprehensive evaluation metric system is necessary for assessing model performance, which includes both subjective and objective metrics [19]. - The article compares various metrics, highlighting the trade-offs between subjective evaluations that reflect human experience and objective metrics that are efficient but may focus on different aspects [20]. Group 6: Datasets - The article summarizes publicly available datasets for speech separation research, categorizing them based on single-channel and multi-channel formats [22]. - Understanding the coverage and difficulty of these datasets aids researchers in selecting appropriate datasets for algorithm evaluation and identifying gaps in current research [22]. Group 7: Performance Comparison - The authors present a comparison of different models' performance on standard datasets, illustrating the progress in speech separation technology over recent years [24]. - Notable improvements in performance metrics, such as SDR, are highlighted, with advanced architectures achieving SDR levels around 20 dB [24][25]. Group 8: Tools and Platforms - The article introduces various open-source tools and platforms that facilitate the development and application of speech separation tasks, comparing their functionalities and limitations [28]. - These tools provide convenient interfaces for researchers to replicate results and build prototype systems, accelerating the transition from research to application [28]. Group 9: Challenges and Future Directions - The article discusses current challenges in the field, including long-duration audio processing, mobile and embedded applications, real-time speech separation, and the rise of generative methods [32][33]. - The integration of pre-training techniques and the focus on target speaker extraction are also identified as key areas for future exploration [33].