SFT

Search documents
后训练的「分」与「合」,SFT&RL 大一统才是正解?
机器之心· 2025-09-14 01:30
Group 1 - The article discusses the limitations of the traditional "SFT followed by RL" paradigm in post-training for AI models, suggesting a unified approach that combines both methods [7][9][10] - It highlights the importance of post-training in aligning the model's capabilities with human values and preferences, addressing the challenges of "catastrophic forgetting" and overfitting associated with SFT [8][11][12] - The emerging trend in the industry is to explore a unified framework for post-training that leverages the strengths of both SFT and RL, rather than treating them as separate processes [10][15][17] Group 2 - The article evaluates the competitive landscape of AI hardware among major players like Meta, OpenAI, Apple, and Google, questioning whether AI hardware will become a new essential or merely a passing trend [2] - It raises questions about the user experience with AI hardware, such as whether it will truly replace traditional devices or simply serve as an additional feature [2][3] - The potential for innovative AI hardware forms to integrate seamlessly into daily life is explored, along with the implications for user interaction and technology adoption [2][3] Group 3 - The article examines the role of generative AI in search, debating whether it will serve as a replacement for traditional search engines or act as a growth engine for expanding user queries and intentions [3] - It discusses how multimodal interactions and conversational AI are redefining task completion for users, potentially enhancing the value of advertising and commercial opportunities [3] - Google's strategy of gradually integrating AI capabilities into its products, rather than waiting for full technological maturity, reflects a proactive approach to product development and market positioning [3]
大模型微调到底有没有技术含量,或者说技术含量到底有多大?
自动驾驶之心· 2025-08-10 23:32
Core Viewpoint - The article emphasizes the importance of individual approaches and methodologies in the field of large language models (LLMs), particularly in the context of fine-tuning and data quality, suggesting that the technical depth of work in this area is highly dependent on personal engagement and practices [5][16]. Data Work - Method 1 involves inheriting training data from colleagues without checking data quality, which may lead to suboptimal results [7]. - Method 2 suggests downloading open-source data to create a "system + query + answer" dataset [8]. - Method 3 focuses on generating data using GPT-4, emphasizing the diversity of prompts and the importance of data quality checks [8]. - Method 4 advocates using user interaction logs to drive data construction, analyzing user feedback to improve answer quality [9]. - Method 5 recommends breaking down complex tasks at the data level to enhance model performance [9]. Training Code - Method 1 involves inheriting training code and making minimal modifications [11]. - Method 2 encourages a thorough understanding of training code parameters and their implications [11]. - Method 3 promotes questioning and improving training code, such as optimizing speed and framework choices [12]. Experimental Analysis - Method 1 suggests running prepared evaluation sets and addressing data quality issues based on results [14]. - Method 2 involves analyzing bad cases from models to identify underlying issues and designing experiments to validate findings [14]. - Method 3 emphasizes the relationship between model results, data quality, and training methods, advocating for a comprehensive analysis of training logs and evaluation results [15]. Community and Collaboration - The article highlights the establishment of a large community focused on various aspects of autonomous driving technology, including large models and multi-sensor fusion, with nearly 4,000 members and over 300 companies and research institutions involved [18].
OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs
AI Engineer· 2025-07-19 21:10
[Music] I'm Ryan. I'm a founding engineer at Bespoke Labs. And today I'm going to talk to you about Open Thoughts, which is our project to create the best open-source reasoning data sets.And I'll be switching tack a little bit from our earlier discussions on reasoning and RL and focus on the reasoning part and you'll see why. So just so we're on the same page, we've talked a lot about reasoning, but what's actually going on here. So I like this graph from JSON which shows this incredible performance that's ...