X @Avi Chawla - Reportify

Which one is better?Opus 4.6, Sonnet 4.6, or GPT-5.2-Codex?The good news: this might not matter soon!Because the models are commoditizing, and the real differentiator is moving elsewhere.On general benchmarks like MMLU, frontier models have saturated to the point where there's barely any room to differentiate.And on the agentic benchmarks that actually reflect production work, like SWE-bench and TerminalBench, what's being measured isn't the model alone.It's model plus the infrastructure around it. On SWE-b ...