GPT-5.2部分基准测试分数超过谷歌但OpenAI“红色警报”尚未解除

Core Insights - OpenAI launched GPT-5.2, including Instant, Thinking, and Pro modes, as a response to competition from Google, particularly after the release of Gemini 3 [1][6] - The release of GPT-5.2 is seen as a significant upgrade, focusing on performance improvements in various benchmark tests compared to its predecessor, GPT-5.1 [1][2] Benchmark Performance - In the GDPval test, GPT-5.2 Thinking scored 70.9%, significantly higher than GPT-5.1's 38.8% [2] - In the ARC-AGI-2 test, GPT-5.2 Thinking achieved a score of 52.9%, compared to GPT-5.1's 17.6% [2] - Other benchmark scores for GPT-5.2 Thinking include 55.6% in SWE-Bench Pro, 92.4% in GPQA Diamond, 88.7% in CharXiv reasoning, and 99.4% in HMMT, all outperforming GPT-5.1 [2] Competitive Landscape - GPT-5.2's performance in key tests allows OpenAI to regain some competitive ground against Google's Gemini 3 Pro, which previously outperformed GPT-5.1 in several benchmarks [3] - OpenAI emphasized that GPT-5.2 is designed for professional knowledge work, outperforming industry experts in various tasks [2][3] Model Capabilities - GPT-5.2 offers enhanced capabilities in creating presentations and spreadsheets, with improved complexity and formatting compared to the previous version [3] - The model can handle long-context documents and perform coding tasks with greater reliability, reducing the need for human intervention [3][4] Error Rate Improvements - GPT-5.2 Thinking has a lower hallucination rate, with a 38% reduction in incorrect answers compared to GPT-5.1 [4] - The model's error rate in chart reasoning and software interface understanding has decreased by approximately 50% [4] Strategic Response - OpenAI's CEO acknowledged the competitive pressure from Google and indicated that the company is in a "red alert" state to prioritize resources effectively [6] - The company plans to continue releasing new products in response to competition, with additional updates expected soon [6]