Workflow
Post-training
icon
Search documents
强化学习环境与科学强化学习:数据工厂与多智能体架构 --- RL Environments and RL for Science_ Data Foundries and Multi-Agent Architectures
2026-01-07 03:05
Summary of Key Points from the Conference Call Industry Overview - The focus of the conference call is on the scaling of Reinforcement Learning (RL) and its applications across various domains, including AI capabilities, coding environments, and data foundries [2][3][51]. Core Insights and Arguments 1. **Scaling RL as a Critical Path**: The scaling of RL is identified as essential for unlocking further AI capabilities, with significant performance gains attributed to increased RL compute [2][4]. 2. **OpenAI's Model Performance**: OpenAI has demonstrated that improvements in model performance over the past 18 months were primarily driven by post-training and scaling up RL compute, using the same base model across various flagship models [4][6]. 3. **Challenges in Scaling RL**: The scaling of RL faces challenges due to the need for a continuous stream of tasks for models to learn from, which is labor-intensive compared to pre-training that utilizes vast internet data [7]. 4. **Task Aggregation**: Companies like Windsurf and Cursor have managed to create competitive models by aggregating tasks and data, even without lab-level resources [9]. 5. **Utility and Capability Evaluation**: OpenAI's GDPval evaluation measures model improvements across 1,000+ tasks in 44 occupations, indicating a shift from abstract intelligence measurement to real-world utility [10][14]. 6. **Autonomous AI Development**: Companies like OpenAI and Anthropic are targeting the development of autonomous AI researchers by 2028 and 2027, respectively, indicating a trend towards models that can operate independently for longer periods [16]. Additional Important Content 1. **Outsourcing Data Tasks**: The need for significant data and task curation has led to outsourcing, with companies like Scale AI historically being major contractors but now absorbed by Meta [19][21]. 2. **Emergence of New Companies**: Over 35 companies have emerged to provide RL environments, focusing on various domains, including website cloning and more sophisticated software environments [24][29]. 3. **Demand for Coding Environments**: There is a high demand for coding environments, with companies acquiring defunct startups for their GitHub repositories to create these environments [37][38]. 4. **Expert Contractors**: Firms like Surge and Mercor are utilized to hire domain-specific experts for task creation, with Surge being a significant player with an estimated annual recurring revenue of around $1 billion [55]. 5. **Chinese Market Dynamics**: Chinese VC firms are attempting to establish local data foundry competitors to serve the ecosystem at lower costs, with most Chinese labs still in early stages of scaling RL [58][59]. This summary encapsulates the key points discussed in the conference call, highlighting the advancements, challenges, and market dynamics within the RL and AI landscape.
Runway’s New Video Model Challenges Rivals Google, OpenAI
Bloomberg Technology· 2025-12-01 22:22
Model Performance & Innovation - Runway's latest V2 model, Runway Ten 4.5%, tops performance charts across all other models [2] - The company is the first to lead leaderboards, surpassing large research labs with consistent, realistic, and creative results [2][4] - Focus, research, and efficiency enable the company to compete with larger research labs [4] - The company has been developing models for almost seven years, building intuition and momentum for improvement [5] - Algorithmic improvements, data captioning, structuring, and model testing are key to pre-training [7] Business Model & Monetization - The company raised $300 million in April with a $3.3 billion valuation [6] - The company utilizes subscriptions and credits for model usage [10] - The model is being released to gaming companies, studios, brands, production companies, and creatives worldwide, with tens of millions of users [10] - The company makes money every time the model is used, with good margins [11][12] - The model is cost-effective compared to other models while maintaining top performance [13]
X @Demis Hassabis
Demis Hassabis· 2025-11-22 20:32
Actually if you want to know what the real ‘secret’ is 😀 it’s world-class research AND world-class engineering AND world-class infra all working closely together with relentless focus and intensity…Oriol Vinyals (@OriolVinyalsML):The secret behind Gemini 3?Simple: Improving pre-training & post-training 🤯Pre-training: Contra the popular belief that scaling is over—which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is https:/ ...
A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai
AI Engineer· 2025-07-19 21:15
Model Reasoning and Applications - Reasoning unlocks new language model applications, exemplified by improved information retrieval [1] - Reasoning models are enhancing applications like website analysis and code assistance, making them more steerable and user-friendly [1] - Reasoning models are pushing the limits of task completion, requiring ongoing effort to determine what models need to continue progress [1] Planning and Training - Planning is a new frontier for language models, requiring a shift in training approaches beyond just reasoning skills [1][2] - The industry needs to develop research plans to train reasoning models that can work autonomously and have meaningful planning capabilities [1] - Calibration is crucial for products, as models tend to overthink, requiring better management of output tokens relative to problem difficulty [1] - Strategy and abstraction are key subsets of planning, enabling models to choose how to break down problems and utilize tools effectively [1] Reinforcement Learning and Compute - Reinforcement learning with verifiable rewards is a core technique, where language models generate completions and receive feedback to update weights [2] - Parallel compute enhances model robustness and exploration, but doesn't solve every problem, indicating a need for balanced approaches [3] - The industry is moving towards considering post-training as a significant portion of compute, potentially reaching parity with pre-training in GPU hours [3]
喝点VC|红杉美国对谈OpenAI前研究主管:预训练已经进入边际效益递减阶段,其真正杠杆在于架构的改进
Z Potentials· 2025-07-04 03:56
Core Insights - The article discusses the evolution of AI, particularly focusing on the "trinity" of pre-training, post-training, and reasoning, and how these components are essential for achieving Artificial General Intelligence (AGI) [3][4][5] - Bob McGrew emphasizes that reasoning will be a significant focus in 2025, with many opportunities for optimization in compute usage, data utilization, and algorithm efficiency [4][5][6] - The article highlights the diminishing returns of pre-training, suggesting that while it remains important, its role is shifting towards architectural improvements rather than sheer computational power [6][8][9] Pre-training, Post-training, and Reasoning - Pre-training has reached a stage of diminishing returns, requiring exponentially more compute for marginal gains in intelligence [7][8] - Post-training focuses on enhancing the model's personality and intelligence, which can yield broad applicability across various fields [9][10] - Reasoning is seen as the "missing piece" that allows models to perform complex tasks through step-by-step thinking, which was previously lacking in models like GPT-3 [14][15] Agent Economics - The cost of AI agents is expected to approach the opportunity cost of compute usage, making it challenging for startups to maintain high pricing due to increased competition [17][18][19] - The article suggests that while AI can automate simple tasks, complex services requiring human understanding will retain their value and scarcity [19][20] Market Opportunities in Robotics - There is a growing interest in robotics, with the belief that the field is nearing commercialization due to advancements in language interfaces and visual encoding [22][25] - Companies like Skilled and Physical Intelligence are highlighted as potential leaders in the robotics space, capitalizing on existing technology and research [22][25] Proprietary Data and Its Value - Proprietary data is becoming less valuable compared to the capabilities of advanced AI models, which can replicate insights without extensive human labor [29][30] - The article discusses the importance of specific customer data that can enhance decision-making, emphasizing the need for trust in data usage [31] Programming and AI Integration - The integration of AI in programming is evolving, with a hybrid model where users engage in traditional coding while AI assists in the background [32][33] - The article notes that while AI can handle repetitive tasks, complex programming still requires human oversight and understanding [33][34] Future of AI and Human Interaction - The article explores how different generations interact with AI, suggesting that AI should empower individuals to become experts in their interests while alleviating mundane tasks [39][42] - It emphasizes the importance of fostering curiosity and problem-solving skills in the next generation, rather than merely teaching specific skills that may soon be automated [43][44]