GPT-5.2部分基准测试分数超过谷歌,但OpenAI“红色警报”尚未解除
Di Yi Cai Jing·2025-12-12 04:13

Core Insights - OpenAI's CEO indicated that the impact of Google's Gemini 3 on the company was less than initially expected, but emphasized the need for focus and rapid response to competitive threats [1][7] - The launch of GPT-5.2, which includes Instant, Thinking, and Pro modes, is seen as OpenAI's counteraction to Google's challenge, occurring just a month after the update to GPT-5.1 [1][7] Performance Metrics - GPT-5.2 shows significant improvements in various benchmark tests compared to GPT-5.1, such as achieving 70.9% in the GDPval test versus 38.8% for GPT-5.1, and 52.9% in the ARC-AGI-2 test compared to 17.6% for GPT-5.1 [3][4] - Other benchmark scores for GPT-5.2 include 55.6% in SWE-Bench Pro, 92.4% in GPQA Diamond, 88.7% in CharXiv reasoning, and 99.4% in HMMT testing, all of which surpass the scores of GPT-5.1 [3] Competitive Landscape - Google's Gemini 3 Pro previously dominated benchmark tests, scoring 31.1% in ARC-AGI-2 and 91.9% in GPQA Diamond, but GPT-5.2 has now surpassed these scores [4] - OpenAI highlighted that GPT-5.2 is designed for professional knowledge work, outperforming or matching industry experts in tasks such as creating presentations and spreadsheets [4] Model Capabilities - GPT-5.2 is noted for its enhanced capabilities in coding tasks, with a lower error rate in generating outputs compared to GPT-5.1, including a 38% reduction in incorrect responses [5] - The model's long-context capabilities allow it to handle complex documents like reports and contracts more effectively [4][5] Strategic Response - OpenAI's "red alert" status remains in effect despite the launch of GPT-5.2, indicating ongoing competitive pressures from Google and others [7] - The company plans to continue releasing additional products in response to competition, with further announcements expected soon [7]