OpenAI最强编程模型登场,实测竟又被Gemini 3 Flash按趴下
3 6 Ke·2025-12-19 03:50

Core Insights - GPT-5.2-Codex integrates the advantages of GPT-5.2 and the advanced capabilities of GPT-5.1-Codex-Max, specifically designed for complex software engineering and cybersecurity tasks [1] - OpenAI has released GPT-5.2-Codex to all paid ChatGPT users, with API access forthcoming [1] - The performance of GPT-5.2-Codex may not meet expectations, showing less than 1% improvement on SWE-Bench Pro and potential performance regressions in some benchmarks [3] Performance Enhancements - GPT-5.2-Codex features native context compression technology, improving long context understanding, tool invocation, and accuracy in coding tasks [5] - In SWE-Bench Pro, GPT-5.2-Codex scored 56.4%, surpassing GPT-5.2's 55.6% and GPT-5.1's 50.8% [5] - In Terminal-Bench 2.0, GPT-5.2-Codex achieved a score of 64.0%, significantly ahead of GPT-5.1-Codex-Max's 58.1% [5] Cybersecurity Applications - GPT-5.2-Codex set a record in Capture The Flag (CTF) challenges, indicating continuous improvement in cybersecurity capabilities [7] - A security researcher utilized GPT-5.1-Codex-Max to discover a vulnerability in React, showcasing the practical value of these models in cybersecurity [9] Competitive Landscape - The release of GPT-5.2-Codex comes amid increasing competition, particularly with Google's launch of the low-cost Gemini 3 Flash model [12] - The actual effectiveness and performance of GPT-5.2-Codex in real-world applications and its comparison with competitors will be a focal point moving forward [12]