生成式方法
Search documents
杨立昆公开“手撕”Meta 内部环境:“LLM 吸光了房间里的空气”,物理世界才是 AGI 的终局
AI科技大本营· 2026-03-30 09:12
Core Viewpoint - The article discusses the limitations of current AI models, particularly in the context of generative video technology, and proposes that the missing piece in AI development is a world model that can learn abstract representations and predict outcomes, with JEPA (Joint Embedding Predictive Architecture) being a potential solution [4][7][12]. Summary by Sections AI's Missing Component - Current AI lacks a significant component, which is a world model capable of learning abstract representations and supporting planning [8][9]. - The evolution of AI has seen two major revolutions: deep learning and large language models (LLMs), with the latter focusing on next-token prediction [9][10]. Limitations of Generative Models - The limitations of LLMs stem from their reliance on next-token prediction, which is not suitable for the unpredictable nature of the real world [7][14]. - Predicting every detail in real-world data, such as video, is fundamentally flawed; instead, the focus should be on learning abstract representations that can support predictions [12][13]. JEPA as a Solution - JEPA aims to find representations that retain input information while being predictive, contrasting with traditional methods that attempt to reconstruct all details [12][13]. - The approach emphasizes that effective modeling requires ignoring many details to retain sufficient structure for predictions [12][13]. Experience and Evidence - Historical experiments indicate that joint embedding methods consistently outperform reconstruction methods in learning representations [16][17]. - The article highlights that the best way to learn representations for natural signals is not through reconstruction but through methods that do not attempt to reconstruct every detail [17]. Transition to AMI Labs - The shift in focus at Meta towards short-term goals and LLMs led to the decision to leave and pursue JEPA at AMI Labs, where the application of these ideas can be explored in areas like industrial process control and robotics [21][22]. Future Directions - The potential for a hierarchical JEPA model is discussed, which would allow for predictions across different time and spatial scales, drawing parallels with concepts in physics [23]. - The article suggests that understanding complex systems, such as economic models, may benefit from a data-driven approach similar to JEPA, focusing on higher-level abstractions [26][27].
拿走200多万奖金的AI人才,到底给出了什么样的技术方案?
机器之心· 2025-12-23 04:15
Core Viewpoint - The article emphasizes the significant opportunities for young individuals proficient in AI technology in China, particularly highlighted by the recent Tencent Advertising Algorithm Competition, which showcased innovative solutions to complex advertising challenges [2][5]. Group 1: Competition Overview - The Tencent Advertising Algorithm Competition revealed that all top 10 teams received job offers from Tencent, with the champion team awarded a prize of 2 million yuan [2]. - The competition focused on a real-world problem in advertising that lacks a definitive solution, pushing participants to explore practical and innovative approaches [4][5]. Group 2: Advertising Challenges - Advertising is often viewed negatively, but it is essential for the sustainability of many services and content, leading platforms to seek smarter, less intrusive advertising methods [7]. - The competition addressed how to make advertising more targeted and relevant, reducing unnecessary exposure to users [7][16]. Group 3: Methodologies in Advertising - Two primary methodologies in advertising recommendation systems were discussed: traditional discriminative methods and emerging generative methods [8]. - Discriminative methods focus on matching user profiles with ads based on predefined features, while generative methods analyze user behavior over time to predict future interactions [9][14]. Group 4: Competition Challenges - Participants faced challenges related to the scale of data, involving millions of ads and users, while having limited computational resources [21]. - The complexity of the data structure, including multimodal historical behavior data, added to the difficulty of modeling user interactions effectively [21][22]. Group 5: Champion Team Solutions - The champion team, Echoch, introduced a three-tier session system, periodic encoding, and time difference bucketing to enhance the model's understanding of user behavior over time [28][29]. - They developed a unified model capable of switching strategies between predicting clicks and conversions, addressing the differing objectives of these actions [34][36]. - The team also incorporated randomness in ad encoding to improve exposure for less popular ads, significantly increasing their training focus [37]. Group 6: Runner-Up Team Solutions - The runner-up team, leejt, tackled the challenge of handling large-scale data by compressing the vocabulary size and using shared embeddings for low-frequency ads [42]. - They implemented session segmentation and heterogeneous temporal graphs to manage the complexity of user behavior data effectively [44]. - The team optimized engineering processes to maximize GPU utilization, achieving significant performance improvements in model training [48]. Group 7: Industry Implications - The competition highlighted the transition from discriminative to generative models in advertising, with Tencent already implementing generative models in its internal systems, yielding positive results reflected in financial data [51]. - Tencent plans to open-source the competition data to foster community development and explore the potential of real-time personalized advertising generation in the future [52].
语音分离最全综述来了!清华等团队深度分析200+文章,系统解析「鸡尾酒会问题」研究
机器之心· 2025-09-03 04:33
Core Viewpoint - The article discusses the revolutionary advancements in the field of speech separation, particularly addressing the "cocktail party problem" through the development of deep neural networks (DNN) [2]. Group 1: Overview of Speech Separation - Speech separation has become crucial for enhancing speech clarity in complex acoustic environments and serves as a preprocessing method for other speech processing tasks [2]. - Researchers from various institutions conducted a comprehensive survey of over 200 representative papers, analyzing the latest research methods across multiple dimensions including deep learning methods, model architectures, evaluation metrics, datasets, and future challenges [2]. Group 2: Problem Definition - The authors categorize speech separation tasks into known and unknown speaker separation based on whether the number of speakers is fixed or variable, highlighting the challenges associated with each scenario [6]. - The need for dynamic output channel determination and the balance between separation quality and termination timing are emphasized as significant challenges in unknown speaker scenarios [6]. Group 3: Learning Paradigms - The article compares supervised and unsupervised learning methods, detailing the advantages and limitations of each approach in the context of speech separation [10]. - Supervised learning is currently the most mature paradigm, utilizing paired mixed audio and clean source audio for training, while unsupervised methods explore training models directly on unlabelled mixed audio [12]. Group 4: Model Architectures - The core components and evolution of speech separation models are summarized, including encoder, separation network, and decoder [14]. - Various architectures such as RNN-based, CNN-based, and transformer models are discussed, showcasing their strengths in capturing long-term dependencies and local feature extraction [17][18]. Group 5: Evaluation Metrics - A comprehensive evaluation metric system is necessary for assessing model performance, which includes both subjective and objective metrics [19]. - The article compares various metrics, highlighting the trade-offs between subjective evaluations that reflect human experience and objective metrics that are efficient but may focus on different aspects [20]. Group 6: Datasets - The article summarizes publicly available datasets for speech separation research, categorizing them based on single-channel and multi-channel formats [22]. - Understanding the coverage and difficulty of these datasets aids researchers in selecting appropriate datasets for algorithm evaluation and identifying gaps in current research [22]. Group 7: Performance Comparison - The authors present a comparison of different models' performance on standard datasets, illustrating the progress in speech separation technology over recent years [24]. - Notable improvements in performance metrics, such as SDR, are highlighted, with advanced architectures achieving SDR levels around 20 dB [24][25]. Group 8: Tools and Platforms - The article introduces various open-source tools and platforms that facilitate the development and application of speech separation tasks, comparing their functionalities and limitations [28]. - These tools provide convenient interfaces for researchers to replicate results and build prototype systems, accelerating the transition from research to application [28]. Group 9: Challenges and Future Directions - The article discusses current challenges in the field, including long-duration audio processing, mobile and embedded applications, real-time speech separation, and the rise of generative methods [32][33]. - The integration of pre-training techniques and the focus on target speaker extraction are also identified as key areas for future exploration [33].