Workflow
算法推理深度
icon
Search documents
GPT-5惨遭零分打脸,顶级AI全军覆没,奥特曼AI博士级能力神话破灭
3 6 Ke· 2025-09-16 00:39
Group 1 - The FormulaOne benchmark test reveals the limitations of top AI models, with GPT-5 achieving only about 4% accuracy on advanced questions and scoring zero on the most difficult problems [1][6][19] - The benchmark, developed by AAI, aims to measure algorithmic reasoning depth beyond competitive programming, focusing on real-world optimization problems [8][15] - The test consists of 220 novel graph-based dynamic programming problems categorized into three levels of difficulty: shallow, deeper, and deepest [16][18] Group 2 - AAI was founded by Amnon Shashua, co-founder of Mobileye, and focuses on AI research and development [10][11] - The benchmark's problems are designed to be easily understandable but require significant creativity and deep reasoning to solve [19][22] - The challenges presented in the deepest level of the benchmark highlight the gap between current AI capabilities and the reasoning required for complex real-world problems [25][30]