Workflow
抽象推理能力
icon
Search documents
GPT-5.2性能爆表,但红色警报没有解除
3 6 Ke· 2025-12-12 01:41
Core Insights - OpenAI has released ChatGPT-5.2, marking the first product launch after issuing a "Code Red" alert, indicating ongoing challenges despite significant performance improvements over its predecessor, GPT-5.1 [1] - The market is becoming more critical of OpenAI, focusing on the cost-effectiveness of computational power, which adds pressure on the company to demonstrate its superiority and irreplaceability [1] Performance Metrics - GPT-5.2 achieved a perfect score of 100% in the AIME 2025 mathematics competition, showcasing its enhanced mathematical reasoning capabilities [2][5] - In various benchmarks, GPT-5.2 outperformed competitors: - SWE-Bench: 55.6% accuracy compared to 50.8% for GPT-5.1, 52.0% for Claude, and 43.3% for Gemini [3] - GPQA: 92.4% accuracy, surpassing GPT-5.1's 88.1% and Claude's 87.0% [3] - AIME 2025: 100% accuracy, compared to 94.0% for Claude and 92.8% for Gemini [4] - ARC-AGI 1: 86.2% accuracy, leading the pack [4] Specialized Applications - GPT-5.2 demonstrated significant potential in professional tasks, achieving a 70.9% success rate against top industry experts in the GDPval benchmark, completing tasks at over 11 times the speed and less than 1% of the cost [5] - In software engineering, it reached 55.6% accuracy in SWE-Bench Pro, indicating strong capabilities in real-world coding tasks [5] Document Understanding and Visual Recognition - The model excelled in long document comprehension, achieving near 100% accuracy on tasks involving 256k tokens, allowing for effective analysis of extensive reports [6] - In visual understanding, GPT-5.2 halved the error rate in tasks related to chart reasoning and software interface comprehension, showing improved spatial recognition of objects [9] Product Variants and Efficiency - The release includes three versions: GPT-5.2 Instant for quick tasks, GPT-5.2 Thinking for deep reasoning, and GPT-5.2 Pro for high-difficulty problems, with the latter achieving a 390-fold efficiency improvement in ARC-AGI-1 testing [11] - The cost for GPT-5.2 has increased significantly, with API pricing set at $1.75 per million input tokens and $14 per million output tokens, reflecting a 40% increase from GPT-5.1 [20][22] Competitive Landscape - OpenAI's pricing strategy contrasts with competitors like Gemini and Claude, which have reduced their prices significantly, positioning GPT-5.2 as a "luxury" product [23][24] - The market dynamics suggest that OpenAI is betting on a segment of users willing to pay a premium for high-quality AI solutions, while also risking alienation if the performance does not meet expectations [24][25]