OpenAI 推出 GPT-5.1-Codex-Max 编程模型：可 “通宵” 处理任务，性能跑分超越谷歌

Core Insights - OpenAI has officially launched the new programming model GPT-5.1-Codex-Max, which significantly enhances long-term reasoning, work efficiency, and real-time interaction capabilities, replacing GPT-5.1-Codex as the default model for the Codex interface, providing developers with a more efficient programming assistance experience [1] Group 1: Model Performance - GPT-5.1-Codex-Max has shown impressive results in key programming benchmark tests, achieving a 77.9% accuracy in the SWE-Bench Verified test, slightly ahead of Google's Gemini 3 Pro at 76.2% [2] - In the Terminal-Bench 2.0 test, GPT-5.1-Codex-Max outperformed Gemini 3 Pro with an accuracy of 58.1% compared to 54.2% [2] - The model demonstrated strong overall performance in the competitive LiveCodeBench Pro coding Elo benchmark, tying with Gemini 3 Pro at 2439 points [2] Group 2: Key Features - The upgrade introduces a "Compaction" mechanism, allowing the model to intelligently retain key information while discarding irrelevant details when approaching context window limits, enabling continuous work across millions of tokens without performance degradation [2] - The model successfully completed complex tasks for over 24 hours during internal testing, including multi-step code refactoring and autonomous debugging, with a token efficiency improvement of approximately 30%, reducing development costs and response delays [2] Group 3: Integration and Usage - GPT-5.1-Codex-Max is integrated into various Codex development environments, including the official command-line tool (Codex CLI), internal code review tools, and interactive programming environments, allowing developers to experience enhanced features like reinforcement learning training visualization and optical law simulations [3] - Currently, the model is not available through a public API, but OpenAI plans to gradually roll it out, requiring users to subscribe to paid plans like ChatGPT Plus, Pro, or Enterprise to access it [3] - OpenAI reports that 95% of its internal engineers use Codex tools weekly, with an average increase of 70% in pull request submissions since adoption, significantly improving development efficiency [3]