X @Sam Altman
Sam Altman·2025-12-11 18:27

Performance is strong across the board: 55.6% on SWE-Bench Pro, 52.9% or ARC-AGI-2, 40.3% on Frontier Math. ...