Workflow
大模型预训练
icon
Search documents
打破「数据暴力」预训练惯性,阿里Qwen、上交大等提出预训练动态数据选择范式OPUS
机器之心· 2026-03-16 08:34
Core Viewpoint - The article challenges the conventional belief that higher quality data is essential for training large models, presenting evidence that dynamically selecting from medium to low-quality data can outperform traditional high-quality data approaches [2][3][5]. Group 1: Data Selection and Optimization - The study introduces OPUS (Optimizer-induced Projected Utility Selection), which aligns data selection with the actual update direction determined by modern optimizers like AdamW and Muon, rather than relying on outdated methods [8][9][11]. - OPUS defines sample utility in the effective update space induced by the optimizer, allowing for a more principled approach to data selection that maximizes the utility of each update step [9][14]. - The method addresses the misalignment gap that occurs when using original gradients for data selection, emphasizing that data selection must be optimizer-dependent [5][10]. Group 2: Methodology and Implementation - OPUS employs a three-step process: aligning targets with a Bench-Proxy pool, efficiently estimating candidate sample utility, and stabilizing selection through redundancy penalties and Boltzmann sampling [11][17][20]. - The computational overhead of OPUS is approximately 4.7%, making it feasible for large-scale pre-training while maintaining efficiency [21][20]. - The method integrates existing data engineering techniques, allowing static filtering to eliminate low-value samples while OPUS dynamically selects from the remaining candidates [35][36]. Group 3: Experimental Results - In experiments, OPUS demonstrated an average accuracy improvement of 2.2% in FineWeb pre-training and achieved an 8× efficiency gain in GPT-XL settings [22][23]. - OPUS outperformed models trained on higher quality data by achieving a 3.18% accuracy increase using only medium-quality data [26]. - The method consistently achieved the lowest average perplexity across various domains, indicating its effectiveness in enhancing general language modeling capabilities [29][30]. Group 4: Implications and Future Directions - OPUS shifts the focus of pre-training from merely accumulating data to optimizing the efficiency of updates, suggesting a new paradigm in model training [34][37]. - The approach highlights the importance of selecting the right samples at the right time, potentially leading to better model performance with fewer resources [26][35]. - As the industry faces a "data wall," OPUS provides a clear pathway for maximizing the marginal gains from each token, emphasizing the need for precision in data utilization [5][36].
MIT天才博士刚毕业,就被前OpenAI CTO抢走,年薪或300万起步
3 6 Ke· 2026-01-09 08:12
Core Insights - Guangxuan Xiao, a PhD graduate from MIT, has officially joined Thinking Machines to focus on pre-training large models [1][6][10] - His academic background includes dual degrees from Tsinghua University in Computer Science and Finance, along with numerous awards and research experiences [6][8][10] Group 1: Academic and Professional Background - Guangxuan Xiao graduated from Tsinghua University with dual degrees in Computer Science and Finance, receiving multiple prestigious awards during his studies [6][8] - He completed his PhD at MIT under the supervision of Professor Song Han, focusing on efficient algorithms and systems for large language models [10][18] - Xiao has interned at major tech companies, including Meta and NVIDIA, where he contributed to research on efficient attention mechanisms and large language model optimization [10][12][18] Group 2: Research Contributions - Xiao's doctoral thesis addresses significant challenges in large language models, proposing solutions for issues like memory overflow and slow inference [18][19] - His research introduced SmoothQuant, achieving lossless quantization for billion-parameter models without retraining, and enabling constant memory streaming inference for long sequences [19][20] - The thesis also includes innovative approaches like DuoAttention and XAttention, which enhance performance while reducing memory usage [19][20] Group 3: Company Insights - Thinking Machines offers competitive salaries, with average base salaries reaching $500,000, significantly higher than those at established companies like OpenAI and Anthropic [21][25] - The company is positioned to attract top talent in the AI field, reflecting the ongoing talent war in Silicon Valley [21][28]
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|36氪独家
36氪· 2025-10-30 13:37
Core Viewpoint - The article discusses the emergence of a new AI paradigm led by Yang Hongxia, who aims to decentralize model training, contrasting with the centralized approaches of major companies like Alibaba and ByteDance [4][12][27]. Group 1: Yang Hongxia's Background and Vision - Yang Hongxia has over seven years of experience in large model research at Alibaba and ByteDance, where she contributed to the development of significant models like M6 and Tongyi Qianwen [5][6]. - After leaving ByteDance in July 2024, she founded InfiX.ai, focusing on model-related technologies and aiming to challenge existing centralized models [7][10]. - Yang's vision includes creating a decentralized model training framework that allows small and medium enterprises, research institutions, and individuals to participate in model training [13][16]. Group 2: Technical Innovations and Frameworks - InfiX.ai has recently open-sourced the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [17][18]. - The company has developed a model fusion technology that allows different domain-specific models to be combined, avoiding resource wastage from redundant training [20][21]. - The InfiMed framework enables the training of small-scale models with strong reasoning capabilities across various medical tasks, particularly in cancer detection [22][26]. Group 3: Market Position and Future Outlook - Yang believes that the future of AI will involve a collaborative approach where every company and institution can have its own expert model, leading to a globalized foundational model for various fields [30][31]. - The article highlights the growing acceptance of decentralized model training in the U.S., with significant funding being raised for companies pursuing this approach [28][29]. - InfiX.ai's focus on challenging fields like healthcare, particularly cancer, is seen as a strategic move to demonstrate the model's capabilities and differentiate it from competitors [72][73].