Core Insights - The release of the GPT-5.2 model by OpenAI shows a significant leap in cognitive abilities, particularly in abstract reasoning and generalization, as indicated by its performance in the ARC-AGI-2 test, which increased from 17.6% to 52.9% [1] - The GDPval score, which measures the economic value of the model, rose dramatically from 38.8% to 70.9%, highlighting a breakthrough in both scaling and reasoning capabilities [1] Performance Metrics - In the SWE-Bench test, GPT-5.2 achieved a score of 55.6%, outperforming GPT-5.1 at 50.8% and other models like Claude and Gemini [2] - For GPQA, GPT-5.2 scored 92.4%, surpassing competitors such as Claude at 88.1% and Gemini at 91.9% [2] - In the CharXiv reasoning test, GPT-5.2 scored 82.1%, significantly higher than Claude's 67.0% [2] - In advanced mathematics, GPT-5.2 achieved a score of 40.3% in the FrontierMath test, compared to 31.0% for Claude and 37.6% for Gemini [2] - The ARC-AGI 1 test saw GPT-5.2 scoring 86.2%, while ARC-AGI 2 showed a notable increase to 52.9% from GPT-5.1's 17.6% [2] - The GDPval score of 70.9% for GPT-5.2 indicates a substantial improvement in knowledge work tasks compared to GPT-5's 38.8% [2]
分析师:GPT-5.2看起来是又一次“质的飞跃”!重要指标分数从38.8%飙升至70.9%
Ge Long Hui·2025-12-12 03:51