X @Sam Altman
Sam Altman·2026-02-14 17:46

We went from AI systems that struggled to do grade school math to AI systems that can solve research-level math problems in just a few years.I agree with Jakub this is perhaps the most important eval now.I am also pretty sure the main reaction will be "it's not that hard" :)Jakub Pachocki (@merettm):Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models.We have run our internal model wit ...