GPT-5.2果然反超谷歌Gemini 3 Pro!北大数院校友核心贡献
量子位·2025-12-12 01:00

Core Insights - OpenAI has released GPT-5.2, which significantly enhances capabilities in various practical fields, including spreadsheet creation, presentation design, coding, and understanding lengthy documents [1][2][3] - The model shows a marked improvement in visual understanding, accurately identifying more components on circuit boards [4] - GPT-5.2 has achieved a new state-of-the-art score of 90.5% in the ARC-AGI-1 test, with a dramatic reduction in task costs from $4,500 to $11.64, indicating a 390-fold efficiency increase over the past year [12][13] Performance Enhancements - GPT-5.2 demonstrates a 71% win rate against human experts in GDPval tests, completing tasks that typically take humans 4-8 hours in a fraction of the time [18][19] - In investment banking tasks, GPT-5.2 Thinking improved its score from 59.1% to 68.4%, reflecting a 9.3% increase in performance [21] - The model's coding capabilities have also improved, achieving an 80% score on SWE-bench Verified and 55.6% on the more challenging SWE-Bench Pro [25][26] Visual and Contextual Understanding - The model has shown a 50% reduction in error rates for understanding scientific paper graphics and has improved spatial awareness of elements in images [34][36] - GPT-5.2 Thinking is the first model to achieve near 100% accuracy on a 256k context length task, showcasing its ability to handle long documents effectively [30] Tool Utilization and Scientific Applications - Tool invocation capabilities have reached new heights, with GPT-5.2 achieving 98.7% in multi-turn interactions in telecom scenarios [40] - In scientific assessments, GPT-5.2 Pro scored 93.2% in GPQA Diamond evaluations, indicating its suitability for assisting researchers [45] Team and Development Insights - OpenAI's recent advancements have been attributed to a new wave of talent, many of whom have strong mathematical backgrounds and joined the company in 2024 [57][58][59]