Core Insights - The release of the GPT-5.2 model by OpenAI shows a significant leap in cognitive abilities, particularly in abstract reasoning and generalization, as evidenced by its performance in the ARC-AGI-2 test, which increased from 17.6% to 52.9% [1] - The GDPval score, which measures the economic value of the model, rose dramatically from 38.8% to 70.9%, indicating a simultaneous breakthrough in both scalability and reasoning capabilities [1] Performance Comparison - In the SWE-Bench test, GPT-5.2 achieved a score of 55.6%, surpassing GPT-5.1's 50.8%, while Anthropic's Claude scored 52.0% and Google's Gemini scored 43.3% [2] - For the GPQA test, GPT-5.2 scored 92.4%, compared to GPT-5.1's 88.1%, with Claude at 87.0% and Gemini at 91.9% [2] - In the CharXiv reasoning test, GPT-5.2 scored 82.1%, significantly higher than GPT-5.1's 67.0%, while Gemini scored 81.4% [2] - The FrontierMath test results showed GPT-5.2 at 40.3%, GPT-5.1 at 31.0%, and Gemini at 37.6% [2] - In advanced mathematics, GPT-5.2 scored 14.6%, while Gemini scored 18.8% [2] Abstract Reasoning Metrics - The ARC-AGI 2 score for GPT-5.2 was 52.9%, a substantial increase from GPT-5.1's 17.6%, while Claude and Gemini scored 37.6% and 31.1% respectively [3] - The GDPval score for GPT-5.2 was reported at 70.9%, a significant rise from GPT-5.1's 38.8% [3]
分析师:GPT-5.2看起来是又一次“质的飞跃”
Ge Long Hui·2025-12-12 03:51