Pre-training
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-12 07:09
How are LLMs trained?4 stages that turn raw text into ChatGPT:→ Pre-training→ Instruction fine-tuning→ Preference fine-tuning→ Adding reasoning capabilities https://t.co/Nd93NHDSI0 ...
强化学习环境与科学强化学习:数据工厂与多智能体架构 --- RL Environments and RL for Science_ Data Foundries and Multi-Agent Architectures
2026-01-07 03:05
Summary of Key Points from the Conference Call Industry Overview - The focus of the conference call is on the scaling of Reinforcement Learning (RL) and its applications across various domains, including AI capabilities, coding environments, and data foundries [2][3][51]. Core Insights and Arguments 1. **Scaling RL as a Critical Path**: The scaling of RL is identified as essential for unlocking further AI capabilities, with significant performance gains attributed to increased RL compute [2][4]. 2. **OpenAI's Model Performance**: OpenAI has demonstrated that improvements in model performance over the past 18 months were primarily driven by post-training and scaling up RL compute, using the same base model across various flagship models [4][6]. 3. **Challenges in Scaling RL**: The scaling of RL faces challenges due to the need for a continuous stream of tasks for models to learn from, which is labor-intensive compared to pre-training that utilizes vast internet data [7]. 4. **Task Aggregation**: Companies like Windsurf and Cursor have managed to create competitive models by aggregating tasks and data, even without lab-level resources [9]. 5. **Utility and Capability Evaluation**: OpenAI's GDPval evaluation measures model improvements across 1,000+ tasks in 44 occupations, indicating a shift from abstract intelligence measurement to real-world utility [10][14]. 6. **Autonomous AI Development**: Companies like OpenAI and Anthropic are targeting the development of autonomous AI researchers by 2028 and 2027, respectively, indicating a trend towards models that can operate independently for longer periods [16]. Additional Important Content 1. **Outsourcing Data Tasks**: The need for significant data and task curation has led to outsourcing, with companies like Scale AI historically being major contractors but now absorbed by Meta [19][21]. 2. **Emergence of New Companies**: Over 35 companies have emerged to provide RL environments, focusing on various domains, including website cloning and more sophisticated software environments [24][29]. 3. **Demand for Coding Environments**: There is a high demand for coding environments, with companies acquiring defunct startups for their GitHub repositories to create these environments [37][38]. 4. **Expert Contractors**: Firms like Surge and Mercor are utilized to hire domain-specific experts for task creation, with Surge being a significant player with an estimated annual recurring revenue of around $1 billion [55]. 5. **Chinese Market Dynamics**: Chinese VC firms are attempting to establish local data foundry competitors to serve the ecosystem at lower costs, with most Chinese labs still in early stages of scaling RL [58][59]. This summary encapsulates the key points discussed in the conference call, highlighting the advancements, challenges, and market dynamics within the RL and AI landscape.
X @Avi Chawla
Avi Chawla· 2025-12-22 06:31
LLM Development & Training - The report introduces a method to build a modern LLM from scratch using Karpathy's nanochat, emphasizing its clean, minimal, and hackable codebase [1] - The process involves training a tokenizer, pre-training for next-word prediction, mid-training for conversational abilities, and SFT (fine-tuning) on high-quality dialogue datasets [1] - Evaluation and logging are integral to every step of the LLM development process [1] Implementation & Accessibility - The method can be reproduced with a single click on a LightningAI studio, requiring zero setup [1]
Sam Altman goes NUCLEAR (CODE RED)
Matthew Berman· 2025-12-03 02:04
In response to Google getting all the love for Gemini 3 and their TPU architecture, Sam Alman has declared code red. And according to the information, they have a new secret model in the works called Garlic. All right, let me take a step back.I just made a video about how Google is probably the best positioned company to win artificial intelligence. They have everything. They have a frontier model. They have AI infrastructure. They have custom silicon.They have a ton of revenue. They have a ton of great res ...
Runway’s New Video Model Challenges Rivals Google, OpenAI
Bloomberg Technology· 2025-12-01 22:22
Model Performance & Innovation - Runway's latest V2 model, Runway Ten 4.5%, tops performance charts across all other models [2] - The company is the first to lead leaderboards, surpassing large research labs with consistent, realistic, and creative results [2][4] - Focus, research, and efficiency enable the company to compete with larger research labs [4] - The company has been developing models for almost seven years, building intuition and momentum for improvement [5] - Algorithmic improvements, data captioning, structuring, and model testing are key to pre-training [7] Business Model & Monetization - The company raised $300 million in April with a $3.3 billion valuation [6] - The company utilizes subscriptions and credits for model usage [10] - The model is being released to gaming companies, studios, brands, production companies, and creatives worldwide, with tens of millions of users [10] - The company makes money every time the model is used, with good margins [11][12] - The model is cost-effective compared to other models while maintaining top performance [13]
X @Avi Chawla
Avi Chawla· 2025-11-24 06:31
There are primarily 4 stages of building LLMs from scratch:- Pre-training- Instruction fine-tuning- Preference fine-tuning- Reasoning fine-tuningLet's understand each of them!0️⃣ Randomly initialized LLMAt this point, the model knows nothing.You ask it “What is an LLM?” and get gibberish like “try peter hand and hello 448Sn”.It hasn’t seen any data yet and possesses just random weights.1️⃣ Pre-trainingThis stage teaches the LLM the basics of language by training it on massive corpora to predict the next tok ...
X @Demis Hassabis
Demis Hassabis· 2025-11-22 20:32
Actually if you want to know what the real ‘secret’ is 😀 it’s world-class research AND world-class engineering AND world-class infra all working closely together with relentless focus and intensity…Oriol Vinyals (@OriolVinyalsML):The secret behind Gemini 3?Simple: Improving pre-training & post-training 🤯Pre-training: Contra the popular belief that scaling is over—which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is https:/ ...
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]
X @Avi Chawla
Avi Chawla· 2025-07-21 20:50
LLM Training Stages - LLM 从零开始训练的四个阶段包括:预训练、指令微调、偏好微调和推理微调 [1] Training Process - 报告解释了从零开始训练 LLM 的四个阶段,并附有可视化说明 [1]
X @Avi Chawla
Avi Chawla· 2025-07-21 06:39
LLM Development Stages - The document outlines four stages for building Large Language Models (LLMs) from scratch for real-world applications [1] - These stages include pre-training, instruction fine-tuning, preference fine-tuning, and reasoning fine-tuning [1] Techniques Overview - The document indicates that these techniques are visually summarized [1]