Workflow
Reasoning
icon
Search documents
OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs
AI Engineer· 2025-07-19 21:10
Open Thoughts项目概览 - Bespoke Labs 发布 Open Thoughts 3,旨在创建最佳的开源推理数据集 [1][9] - Open Thoughts 项目专注于推理数据配方,以解决创建强大推理模型的关键缺失环节 [6][9] - Open Thoughts 3 在科学、代码和数学等领域都优于 Deepseek R1 quen 7B 模型 [13] 数据集创建与优化 - 数据集流水线包括问题来源、混合、过滤、答案生成和答案过滤等步骤 [17] - 实验创建了超过 5000 个数据集和近 3000 个模型,以严格评估流水线中每个步骤的不同决策 [18] - 每个问题采样多个推理轨迹效果显著,在固定问题规模下,性能不会下降,允许数据规模扩大 16 倍 [19][20] - 合成问题是可扩展的,可以进一步提高准确性 [22] - 问题过滤通过让语言模型评估问题的难度和答案的长度来筛选高质量问题 [23] 关键学习与发现 - 少量高质量的数据来源优于大量多样性的数据来源 [25] - 对于 SFT 和知识蒸馏,基于答案过滤或验证答案似乎没有帮助 [26] - 较强的评估基准模型并不一定意味着它是一个更好的教师模型,例如,Quen 32B 是比 Deepseek R1 更好的教师模型 [21] - 通过知识蒸馏,模型可以在某些领域超越教师模型,例如在法律推理领域 [35][36][37] 实践建议 - 根据特定领域调整数据配方,从 Open Thoughts 的配方开始迭代 [29] - 针对代码、科学和数学等不同领域,应区别研究流水线的每个步骤 [29][30] - 如果特定领域的数据不足,可以将现有数据转换为问题,并使用上下文示例生成更多数据 [32] - 评估至关重要,需要使用 Evalchemy 等开源库来确保模型改进的有效性 [33][34]
Kimi K2 is INSANE... (Open-Source is BACK!)
Matthew Berman· 2025-07-14 17:43
Model Overview - Kimmy K2 is a state-of-the-art mixture of experts language model with 32 billion activated parameters and 1 trillion total parameters [3] - The model was pre-trained on 155% trillion tokens with zero training instability [4] - Kimmy K2 supports up to 2 million tokens in the context window [5] Performance Benchmarks - Kimmy K2 Instruct beats Deepseek, Quen, and GPT41 on SWEBench verified, coming in right behind Cloud 4 Opus [7] - On Live Codebench, Kimmy K2 beats Cloud 4 Opus [7] - Kimmy K2 tops the list on Amy 2025 for math, GPQA Diamond [8] Optimization and Training - The model is trained with the Muon optimizer [4] - Kimmy K2 achieves exceptional performance across frontier knowledge reasoning and coding tasks [4] - The training process was open source [8] Availability and Cost - Inference is available through Kimmy directly at $0.15 per million input tokens with a cache, $0.60 without a cache, and $2.50 per million output tokens [10] - Kimmy K2 is available on Open Router [13] Industry Reception - Industry experts compare Kimmy K2 to Deep Seek V3 [11] - Kimmy K2 is recognized as a potentially new leader in open LLMs [14]
喝点VC|红杉美国对谈OpenAI前研究主管:预训练已经进入边际效益递减阶段,其真正杠杆在于架构的改进
Z Potentials· 2025-07-04 03:56
Core Insights - The article discusses the evolution of AI, particularly focusing on the "trinity" of pre-training, post-training, and reasoning, and how these components are essential for achieving Artificial General Intelligence (AGI) [3][4][5] - Bob McGrew emphasizes that reasoning will be a significant focus in 2025, with many opportunities for optimization in compute usage, data utilization, and algorithm efficiency [4][5][6] - The article highlights the diminishing returns of pre-training, suggesting that while it remains important, its role is shifting towards architectural improvements rather than sheer computational power [6][8][9] Pre-training, Post-training, and Reasoning - Pre-training has reached a stage of diminishing returns, requiring exponentially more compute for marginal gains in intelligence [7][8] - Post-training focuses on enhancing the model's personality and intelligence, which can yield broad applicability across various fields [9][10] - Reasoning is seen as the "missing piece" that allows models to perform complex tasks through step-by-step thinking, which was previously lacking in models like GPT-3 [14][15] Agent Economics - The cost of AI agents is expected to approach the opportunity cost of compute usage, making it challenging for startups to maintain high pricing due to increased competition [17][18][19] - The article suggests that while AI can automate simple tasks, complex services requiring human understanding will retain their value and scarcity [19][20] Market Opportunities in Robotics - There is a growing interest in robotics, with the belief that the field is nearing commercialization due to advancements in language interfaces and visual encoding [22][25] - Companies like Skilled and Physical Intelligence are highlighted as potential leaders in the robotics space, capitalizing on existing technology and research [22][25] Proprietary Data and Its Value - Proprietary data is becoming less valuable compared to the capabilities of advanced AI models, which can replicate insights without extensive human labor [29][30] - The article discusses the importance of specific customer data that can enhance decision-making, emphasizing the need for trust in data usage [31] Programming and AI Integration - The integration of AI in programming is evolving, with a hybrid model where users engage in traditional coding while AI assists in the background [32][33] - The article notes that while AI can handle repetitive tasks, complex programming still requires human oversight and understanding [33][34] Future of AI and Human Interaction - The article explores how different generations interact with AI, suggesting that AI should empower individuals to become experts in their interests while alleviating mundane tasks [39][42] - It emphasizes the importance of fostering curiosity and problem-solving skills in the next generation, rather than merely teaching specific skills that may soon be automated [43][44]