Agentic coding
Search documents
GPT-5 Codex is nuts...
Matthew Berman· 2025-09-15 22:31
Product Overview - OpenAI releases GPT5 Codeex, optimized for agentic coding, available in various environments like terminal, IDE, GitHub, and ChatGPT iOS app [1][2] - GPT5 Codeex is included with ChatGpt Plus Pro business edu and enterprise plans [3] Performance Benchmarks - GPT5 Codeex achieves 74.5% on SWEBench verified, a slight improvement over GPT5's 72.8% [3] - Code refactoring sees a significant improvement with GPT5 Codeex at 51.3% compared to GPT5's 33.9% [3] - GPT5 Codeex can work independently for over 7 hours on complex tasks [4] - GPT5 Codeex uses 93.7% fewer tokens than GPT5 for simpler tasks but spends twice as long on complex use cases [6] - GPT5 Codeex reduces incorrect comments to 4.4% compared to GPT5's 13.7% and increases high impact comments to 52.4% from 39.4% [7] Features and Capabilities - Codeex is trained for code reviews, identifying critical flaws by navigating codebase, reasoning through dependencies, and running code and tests [6] - Codeex CLI updates include better formatted tool calls and diffs, simplified approval modes, and conversation state compaction [12][13] - Codeex automates environment setup by scanning for setup scripts and fetching dependencies at runtime [15] - Codeex can spin up its own browser, iterate on its builds, and attach screenshots to tasks and GitHub PRs [15] - Codeex reviews PRs by matching stated intent to the actual diff, reasoning over the codebase, and executing code and tests [16] Windsurf Integration - Windsurf is highlighted as a powerful agentic IDE, especially after being acquired by Cognition [9] - Windsurf offers features like deep wiki, vibe, replace, one-click MCP store, sophisticated memory, and deep integration with Devon [10][11] Pricing and Availability - Pro plan at $200 per month can support a full work week across multiple projects, positioning it as an additional developer [19] - Business plans offer credit purchases for exceeding included limits, while enterprise plans provide a shared credit pool [20] Infrastructure Improvements - Cloud infrastructure performance is improved by caching containers, reducing medium completion time for new tasks and follow-ups by 90% [14]
X @Sam Altman
Sam Altman· 2025-09-15 18:01
Product Update - GPT-5-Codex is released, an enhanced version of GPT-5 excelling in agentic coding [1] - The new version is faster and smarter, with added capabilities [1] Team Performance - The development team has been highly productive [1]
How to Improve your Vibe Coding — Ian Butler
AI Engineer· 2025-08-03 04:32
Agent Performance - Current agents have a low overall bug find rate and generate a significant amount of false positives [1][2] - Some agents have a true positive rate of less than 10% for finding bugs [2] - Three out of six agents benchmarked had a 10% or less true positive rate out of 900+ reports [3] - One agent produced 70 issues for a single task, all of which were false [4] - Cursor had a 97% false positive rate over 100+ repos and 1,200+ issues [4] - Thinking models are significantly better at finding bugs in a codebase [8][18] - Agents are not holistically looking at files, leading to high variability across runs [20] Implications for Developers - Alert fatigue reduces the effectiveness of trusting agents, potentially leading to bugs in production [5] - Developers are unlikely to sift through numerous false positives to identify actual bugs [4] Recommendations for Improving Agent Performance - Use bug-focused rules with scoped instructions detailing security issues and logical bugs [6] - Prioritize naming explicit classes of bugs in rules, such as "off bypasses" or "SQL injection" [11] - Require fix validation by ensuring agents write and pass tests before incorporating changes [12] - Manage context thoroughly by feeding diffs of code changes and preventing key files from being summarized [15] - Ask agents to create a step-by-step component inventory of the codebase [16] - Bias the model with specific security information like the OWASP Top 10 [9][10]
Claude Code & the evolution of agentic coding - Boris Cherny
AI Engineer· 2025-07-04 16:00
[Music] Hello. This awesome. This is a big crowd.Who here has used quad code before. Jesus. Awesome.That's what I like to see. Cool. So, my name is Boris.I'm a member of technical staff at Enthropic and creator of Quad Code. And um I was struggling with what to talk about for audience that already knows quad code, already knows AI and all the coding tools and agentic coding and stuff like that. So, I'm going to zoom out a little bit and then we'll zoom back in.So here's my TLDDR. The model is moving really ...
OpenAI Codex Team: From Coding Autocomplete to Asynchronous Autonomous Agents
Sequoia Capital· 2025-06-10 09:00
OpenAI Codex Overview - OpenAI's Codex team is developing AI coding tools to help developers delegate tasks to cloud and local coding agents, evolving from autocomplete to autonomous task completion [3] - Codex is RL tuned to be great at day-to-day enterprise development tasks, differing from previous models excelling in competitive programming [4] - Codex is envisioned as an agent working independently on its own computer, allowing developers to delegate tasks rather than pair with the AI [13] - Codex CLI allows developers to work with Codex in their terminal, while Codex in ChatGPT operates on its own computer [16][17] Model Training and Capabilities - Training efforts focused on aligning the model to the preferences of professional software engineers, improving code mergeability [20] - Codex excels at bug fixing by independently verifying and reproducing issues, often providing usable fixes [22][23] - The model can cite its own work, including files changed and terminal outputs, facilitating easier review [34] - Codex can generate its own plans, which helps to specify everything up front [60] Future of Software Development - OpenAI envisions a future where most coding is done by agents working independently, shifting the focus to reviewing and validating code [28][38] - The company aims to create a unified assistant within ChatGPT that can handle various tasks, including coding, without requiring separate agents [70] - The market is expected to shift towards agents writing the majority of code in their own environments, connected to the tools developers use [75][76] - OpenAI believes the number of professional software developers will increase as coding becomes easier [46][47]