Software engineering
Search documents
GPT-5 Codex is nuts...
Matthew Bermanยท 2025-09-15 22:31
Product Overview - OpenAI releases GPT5 Codeex, optimized for agentic coding, available in various environments like terminal, IDE, GitHub, and ChatGPT iOS app [1][2] - GPT5 Codeex is included with ChatGpt Plus Pro business edu and enterprise plans [3] Performance Benchmarks - GPT5 Codeex achieves 74.5% on SWEBench verified, a slight improvement over GPT5's 72.8% [3] - Code refactoring sees a significant improvement with GPT5 Codeex at 51.3% compared to GPT5's 33.9% [3] - GPT5 Codeex can work independently for over 7 hours on complex tasks [4] - GPT5 Codeex uses 93.7% fewer tokens than GPT5 for simpler tasks but spends twice as long on complex use cases [6] - GPT5 Codeex reduces incorrect comments to 4.4% compared to GPT5's 13.7% and increases high impact comments to 52.4% from 39.4% [7] Features and Capabilities - Codeex is trained for code reviews, identifying critical flaws by navigating codebase, reasoning through dependencies, and running code and tests [6] - Codeex CLI updates include better formatted tool calls and diffs, simplified approval modes, and conversation state compaction [12][13] - Codeex automates environment setup by scanning for setup scripts and fetching dependencies at runtime [15] - Codeex can spin up its own browser, iterate on its builds, and attach screenshots to tasks and GitHub PRs [15] - Codeex reviews PRs by matching stated intent to the actual diff, reasoning over the codebase, and executing code and tests [16] Windsurf Integration - Windsurf is highlighted as a powerful agentic IDE, especially after being acquired by Cognition [9] - Windsurf offers features like deep wiki, vibe, replace, one-click MCP store, sophisticated memory, and deep integration with Devon [10][11] Pricing and Availability - Pro plan at $200 per month can support a full work week across multiple projects, positioning it as an additional developer [19] - Business plans offer credit purchases for exceeding included limits, while enterprise plans provide a shared credit pool [20] Infrastructure Improvements - Cloud infrastructure performance is improved by caching containers, reducing medium completion time for new tasks and follow-ups by 90% [14]