Agentic coding

Search documents
GPT-5 Codex is nuts...
Matthew Berman· 2025-09-15 22:31
OpenAI just dropped GPT5 Codeex. If you thought Codeex was a great product, now it's powered by GPT5. It was only a matter of time.So, if you click the little get started button, look at this. VS Code Cursor, Windsor, VS Code Marketplace, Codeex Web. Today, we're releasing GPT5 Codeex, a version of GPT5, further optimized for agentic coding in Codeex.It was trained with a focus on realworld software engineering work. It's equally proficient at quick interactive sessions and at independently powering through ...
X @Sam Altman
Sam Altman· 2025-09-15 18:01
GPT-5-Codex is here: a version of GPT-5 better at agentic coding.It is faster, smarter, and has new capabilities. Let us know what you think!The team has been absolutely cooking, very fun to watch. ...
How to Improve your Vibe Coding — Ian Butler
AI Engineer· 2025-08-03 04:32
Agent Performance - Current agents have a low overall bug find rate and generate a significant amount of false positives [1][2] - Some agents have a true positive rate of less than 10% for finding bugs [2] - Three out of six agents benchmarked had a 10% or less true positive rate out of 900+ reports [3] - One agent produced 70 issues for a single task, all of which were false [4] - Cursor had a 97% false positive rate over 100+ repos and 1,200+ issues [4] - Thinking models are significantly better at finding bugs in a codebase [8][18] - Agents are not holistically looking at files, leading to high variability across runs [20] Implications for Developers - Alert fatigue reduces the effectiveness of trusting agents, potentially leading to bugs in production [5] - Developers are unlikely to sift through numerous false positives to identify actual bugs [4] Recommendations for Improving Agent Performance - Use bug-focused rules with scoped instructions detailing security issues and logical bugs [6] - Prioritize naming explicit classes of bugs in rules, such as "off bypasses" or "SQL injection" [11] - Require fix validation by ensuring agents write and pass tests before incorporating changes [12] - Manage context thoroughly by feeding diffs of code changes and preventing key files from being summarized [15] - Ask agents to create a step-by-step component inventory of the codebase [16] - Bias the model with specific security information like the OWASP Top 10 [9][10]
Claude Code & the evolution of agentic coding - Boris Cherny
AI Engineer· 2025-07-04 16:00
[Music] Hello. This awesome. This is a big crowd.Who here has used quad code before. Jesus. Awesome.That's what I like to see. Cool. So, my name is Boris.I'm a member of technical staff at Enthropic and creator of Quad Code. And um I was struggling with what to talk about for audience that already knows quad code, already knows AI and all the coding tools and agentic coding and stuff like that. So, I'm going to zoom out a little bit and then we'll zoom back in.So here's my TLDDR. The model is moving really ...
OpenAI Codex Team: From Coding Autocomplete to Asynchronous Autonomous Agents
Sequoia Capital· 2025-06-10 09:00
OpenAI Codex Overview - OpenAI's Codex team is developing AI coding tools to help developers delegate tasks to cloud and local coding agents, evolving from autocomplete to autonomous task completion [3] - Codex is RL tuned to be great at day-to-day enterprise development tasks, differing from previous models excelling in competitive programming [4] - Codex is envisioned as an agent working independently on its own computer, allowing developers to delegate tasks rather than pair with the AI [13] - Codex CLI allows developers to work with Codex in their terminal, while Codex in ChatGPT operates on its own computer [16][17] Model Training and Capabilities - Training efforts focused on aligning the model to the preferences of professional software engineers, improving code mergeability [20] - Codex excels at bug fixing by independently verifying and reproducing issues, often providing usable fixes [22][23] - The model can cite its own work, including files changed and terminal outputs, facilitating easier review [34] - Codex can generate its own plans, which helps to specify everything up front [60] Future of Software Development - OpenAI envisions a future where most coding is done by agents working independently, shifting the focus to reviewing and validating code [28][38] - The company aims to create a unified assistant within ChatGPT that can handle various tasks, including coding, without requiring separate agents [70] - The market is expected to shift towards agents writing the majority of code in their own environments, connected to the tools developers use [75][76] - OpenAI believes the number of professional software developers will increase as coding becomes easier [46][47]