Pre-training

Search documents
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]
X @Avi Chawla
Avi Chawla· 2025-07-21 20:50
4 stages of LLM training from scratch:- Pre-training- Instruction fine-tuning- Preference fine-tuning- Reasoning fine-tuningRead the explainer thread below to learn more👇 https://t.co/5Ut5mp8Fm4Avi Chawla (@_avichawla):4 stages of training LLMs from scratch, clearly explained (with visuals): ...
X @Avi Chawla
Avi Chawla· 2025-07-21 06:39
Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.We'll cover:- Pre-training- Instruction fine-tuning- Preference fine-tuning- Reasoning fine-tuningThe visual summarizes these techniques.Let's dive in! https://t.co/SiqzXaiZd0 ...
喝点VC|红杉美国对谈OpenAI前研究主管:预训练已经进入边际效益递减阶段,其真正杠杆在于架构的改进
Z Potentials· 2025-07-04 03:56
Core Insights - The article discusses the evolution of AI, particularly focusing on the "trinity" of pre-training, post-training, and reasoning, and how these components are essential for achieving Artificial General Intelligence (AGI) [3][4][5] - Bob McGrew emphasizes that reasoning will be a significant focus in 2025, with many opportunities for optimization in compute usage, data utilization, and algorithm efficiency [4][5][6] - The article highlights the diminishing returns of pre-training, suggesting that while it remains important, its role is shifting towards architectural improvements rather than sheer computational power [6][8][9] Pre-training, Post-training, and Reasoning - Pre-training has reached a stage of diminishing returns, requiring exponentially more compute for marginal gains in intelligence [7][8] - Post-training focuses on enhancing the model's personality and intelligence, which can yield broad applicability across various fields [9][10] - Reasoning is seen as the "missing piece" that allows models to perform complex tasks through step-by-step thinking, which was previously lacking in models like GPT-3 [14][15] Agent Economics - The cost of AI agents is expected to approach the opportunity cost of compute usage, making it challenging for startups to maintain high pricing due to increased competition [17][18][19] - The article suggests that while AI can automate simple tasks, complex services requiring human understanding will retain their value and scarcity [19][20] Market Opportunities in Robotics - There is a growing interest in robotics, with the belief that the field is nearing commercialization due to advancements in language interfaces and visual encoding [22][25] - Companies like Skilled and Physical Intelligence are highlighted as potential leaders in the robotics space, capitalizing on existing technology and research [22][25] Proprietary Data and Its Value - Proprietary data is becoming less valuable compared to the capabilities of advanced AI models, which can replicate insights without extensive human labor [29][30] - The article discusses the importance of specific customer data that can enhance decision-making, emphasizing the need for trust in data usage [31] Programming and AI Integration - The integration of AI in programming is evolving, with a hybrid model where users engage in traditional coding while AI assists in the background [32][33] - The article notes that while AI can handle repetitive tasks, complex programming still requires human oversight and understanding [33][34] Future of AI and Human Interaction - The article explores how different generations interact with AI, suggesting that AI should empower individuals to become experts in their interests while alleviating mundane tasks [39][42] - It emphasizes the importance of fostering curiosity and problem-solving skills in the next generation, rather than merely teaching specific skills that may soon be automated [43][44]