Workflow
AI vision
icon
Search documents
OpenAI just dropped GPT-5.2 and...
Matthew Berman· 2025-12-12 00:57
Model Performance Improvements - GPT 5.2% shows nearly doubled score in GDP value, a real-world knowledge task, compared to 5.1% [1] - Swechen Pro shows a 5% increase in performance [1] - ARC AGI 2 improves from 17% to 52%, demonstrating better generalization ability [2] - GPT 5.2% exhibits enhanced accuracy in AI vision tasks, such as identifying elements on a motherboard [4] Capabilities Showcase - GPT 5.2% demonstrates improved spreadsheet capabilities [2] - GPT 5.2% is capable of creating sophisticated ocean wave simulations with adjustable parameters like wind speed, wave height, and lighting [3] - GPT 5.2% excels at coding [3] Pricing and Availability - GPT 5.2% is rolling out to paid plans [5] - The pricing for 5.2% is $1.75 per million input tokens, an increase from 5.1%'s $1.25 [5] - The pricing for 5.2% is $14 per million output tokens, an increase from 5.1%'s $10 [5]
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]