Post-training - filings, earnings calls, financial reports, news

Post-training

Search documents

强化学习环境与科学强化学习：数据工厂与多智能体架构 --- RL Environments and RL for Science_ Data Foundries and Multi-Agent Architectures

2026-01-07 03:05

Summary of Key Points from the Conference Call Industry Overview - The focus of the conference call is on the scaling of Reinforcement Learning (RL) and its applications across various domains, including AI capabilities, coding environments, and data foundries [2][3][51]. Core Insights and Arguments 1. **Scaling RL as a Critical Path**: The scaling of RL is identified as essential for unlocking further AI capabilities, with significant performance gains attributed to increased RL compute [2][4]. 2. **OpenAI's Model Performance**: OpenAI has demonstrated that improvements in model performance over the past 18 months were primarily driven by post-training and scaling up RL compute, using the same base model across various flagship models [4][6]. 3. **Challenges in Scaling RL**: The scaling of RL faces challenges due to the need for a continuous stream of tasks for models to learn from, which is labor-intensive compared to pre-training that utilizes vast internet data [7]. 4. **Task Aggregation**: Companies like Windsurf and Cursor have managed to create competitive models by aggregating tasks and data, even without lab-level resources [9]. 5. **Utility and Capability Evaluation**: OpenAI's GDPval evaluation measures model improvements across 1,000+ tasks in 44 occupations, indicating a shift from abstract intelligence measurement to real-world utility [10][14]. 6. **Autonomous AI Development**: Companies like OpenAI and Anthropic are targeting the development of autonomous AI researchers by 2028 and 2027, respectively, indicating a trend towards models that can operate independently for longer periods [16]. Additional Important Content 1. **Outsourcing Data Tasks**: The need for significant data and task curation has led to outsourcing, with companies like Scale AI historically being major contractors but now absorbed by Meta [19][21]. 2. **Emergence of New Companies**: Over 35 companies have emerged to provide RL environments, focusing on various domains, including website cloning and more sophisticated software environments [24][29]. 3. **Demand for Coding Environments**: There is a high demand for coding environments, with companies acquiring defunct startups for their GitHub repositories to create these environments [37][38]. 4. **Expert Contractors**: Firms like Surge and Mercor are utilized to hire domain-specific experts for task creation, with Surge being a significant player with an estimated annual recurring revenue of around $1 billion [55]. 5. **Chinese Market Dynamics**: Chinese VC firms are attempting to establish local data foundry competitors to serve the ecosystem at lower costs, with most Chinese labs still in early stages of scaling RL [58][59]. This summary encapsulates the key points discussed in the conference call, highlighting the advancements, challenges, and market dynamics within the RL and AI landscape.

Reinforcement Learning

Pre-training

Post-training

Artificial Intelligence

GPT-5

Claude for Life Sciences

Reinforcement Learning

Pre-training

Post-training

Artificial Intelligence

GPT-5

Claude for Life Sciences

Runway’s New Video Model Challenges Rivals Google, OpenAI

Bloomberg Technology· 2025-12-01 22:22

Model Performance & Innovation - Runway's latest V2 model, Runway Ten 4.5%, tops performance charts across all other models [2] - The company is the first to lead leaderboards, surpassing large research labs with consistent, realistic, and creative results [2][4] - Focus, research, and efficiency enable the company to compete with larger research labs [4] - The company has been developing models for almost seven years, building intuition and momentum for improvement [5] - Algorithmic improvements, data captioning, structuring, and model testing are key to pre-training [7] Business Model & Monetization - The company raised $300 million in April with a $3.3 billion valuation [6] - The company utilizes subscriptions and credits for model usage [10] - The model is being released to gaming companies, studios, brands, production companies, and creatives worldwide, with tens of millions of users [10] - The company makes money every time the model is used, with good margins [11][12] - The model is cost-effective compared to other models while maintaining top performance [13]

Demis Hassabis· 2025-11-22 20:32

Actually if you want to know what the real ‘secret’ is 😀 it’s world-class research AND world-class engineering AND world-class infra all working closely together with relentless focus and intensity…Oriol Vinyals (@OriolVinyalsML):The secret behind Gemini 3?Simple: Improving pre-training & post-training 🤯Pre-training: Contra the popular belief that scaling is over—which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is https:/ ...

A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai

AI Engineer· 2025-07-19 21:15

Model Reasoning and Applications - Reasoning unlocks new language model applications, exemplified by improved information retrieval [1] - Reasoning models are enhancing applications like website analysis and code assistance, making them more steerable and user-friendly [1] - Reasoning models are pushing the limits of task completion, requiring ongoing effort to determine what models need to continue progress [1] Planning and Training - Planning is a new frontier for language models, requiring a shift in training approaches beyond just reasoning skills [1][2] - The industry needs to develop research plans to train reasoning models that can work autonomously and have meaningful planning capabilities [1] - Calibration is crucial for products, as models tend to overthink, requiring better management of output tokens relative to problem difficulty [1] - Strategy and abstraction are key subsets of planning, enabling models to choose how to break down problems and utilize tools effectively [1] Reinforcement Learning and Compute - Reinforcement learning with verifiable rewards is a core technique, where language models generate completions and receive feedback to update weights [2] - Parallel compute enhances model robustness and exploration, but doesn't solve every problem, indicating a need for balanced approaches [3] - The industry is moving towards considering post-training as a significant portion of compute, potentially reaching parity with pre-training in GPU hours [3]

Reasoning Models

Planning

Reinforcement Learning

Post-training

Language Model Applications

Calibration

Reasoning Models

Planning

Reinforcement Learning

Post-training

Language Model Applications

Calibration

喝点VC｜红杉美国对谈OpenAI前研究主管：预训练已经进入边际效益递减阶段，其真正杠杆在于架构的改进

Z Potentials· 2025-07-04 03:56

Core Insights - The article discusses the evolution of AI, particularly focusing on the "trinity" of pre-training, post-training, and reasoning, and how these components are essential for achieving Artificial General Intelligence (AGI) [3][4][5] - Bob McGrew emphasizes that reasoning will be a significant focus in 2025, with many opportunities for optimization in compute usage, data utilization, and algorithm efficiency [4][5][6] - The article highlights the diminishing returns of pre-training, suggesting that while it remains important, its role is shifting towards architectural improvements rather than sheer computational power [6][8][9] Pre-training, Post-training, and Reasoning - Pre-training has reached a stage of diminishing returns, requiring exponentially more compute for marginal gains in intelligence [7][8] - Post-training focuses on enhancing the model's personality and intelligence, which can yield broad applicability across various fields [9][10] - Reasoning is seen as the "missing piece" that allows models to perform complex tasks through step-by-step thinking, which was previously lacking in models like GPT-3 [14][15] Agent Economics - The cost of AI agents is expected to approach the opportunity cost of compute usage, making it challenging for startups to maintain high pricing due to increased competition [17][18][19] - The article suggests that while AI can automate simple tasks, complex services requiring human understanding will retain their value and scarcity [19][20] Market Opportunities in Robotics - There is a growing interest in robotics, with the belief that the field is nearing commercialization due to advancements in language interfaces and visual encoding [22][25] - Companies like Skilled and Physical Intelligence are highlighted as potential leaders in the robotics space, capitalizing on existing technology and research [22][25] Proprietary Data and Its Value - Proprietary data is becoming less valuable compared to the capabilities of advanced AI models, which can replicate insights without extensive human labor [29][30] - The article discusses the importance of specific customer data that can enhance decision-making, emphasizing the need for trust in data usage [31] Programming and AI Integration - The integration of AI in programming is evolving, with a hybrid model where users engage in traditional coding while AI assists in the background [32][33] - The article notes that while AI can handle repetitive tasks, complex programming still requires human oversight and understanding [33][34] Future of AI and Human Interaction - The article explores how different generations interact with AI, suggesting that AI should empower individuals to become experts in their interests while alleviating mundane tasks [39][42] - It emphasizes the importance of fostering curiosity and problem-solving skills in the next generation, rather than merely teaching specific skills that may soon be automated [43][44]

Artificial General Intelligence (AGI)

Artificial Intelligence

Artificial General Intelligence (AGI)

Artificial Intelligence