GPD5
Search documents
Shipping with Codex
OpenAI· 2025-10-08 17:01
Product Updates & Enhancements - OpenAI's Codex has undergone a significant overhaul, featuring an improved agent with a smarter reasoning model (GPT5 codecs) and a rewritten harness for better planning and interaction capabilities [3][4][5] - The Codex CLI has been revamped with simplified approval modes, a more legible UI, and default sandboxing for safety, with frequent updates based on user feedback [6] - Codex is now natively integrated into IDEs like VS Code and Cursor as an extension, attracting 100,000 users within the first week [7] - Codex Cloud has been upgraded to run tasks 90% faster, enabling parallel task execution and remote command through mobile devices [8] Usage & Impact - 92% of OpenAI's technical staff uses Codex daily, a significant increase from 50% in July [14] - Engineers using Codex submit 70% more pull requests (PRs) per week [14] - Code review, powered by GPT5 codeex, is now frequently enabled by default due to its high signal in identifying critical issues [13][14] Code Review & Testing - Codex has been specifically trained for thorough code review, capable of identifying complex bugs and suggesting fixes [12][44] - Codex supports test-driven development (TDD) by running tests, fixing code, and re-running tests until they pass, and can also verify UI visually using snapshots [21][22] - Local code reviews can be performed using slash commands in the CLI, allowing developers to review and fix code before submitting PRs [47][48] Workflow & Scalability - Codex can be used to create detailed plans and specifications for complex features, acting as a "senior engineer" capable of handling its own documentation [32][31] - Codex can sustain productive sessions for over seven hours, processing more than 150 million tokens for large projects [27] - The workflow from idea to pull request can be streamlined into a few steps with Codex, involving rigorous planning and thorough testing [41]
From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
a16z· 2025-09-25 13:00
Research & Development Focus - OpenAI is targeting the production of an automated researcher to automate the discovery of new ideas, with a focus on economically relevant advancements [1][3] - The company is extending the reasoning horizon of models, aiming for them to autonomously operate for longer periods, measured by performance in math and programming competitions [3] - OpenAI is working on improving the ability of models to handle more difficult and messy real-world coding environments, focusing on style, proactivity, and latency [12][13] Model Capabilities & Advancements - GPT-5 aims to bring reasoning into the mainstream, improving upon previous models like O3 by delivering reasoning and more agentic behavior by default [1] - The company observed significant progress in models' ability to solve hard science problems, with instances of discovering non-trivial new mathematics [1] - Reinforcement Learning (RL) continues to be a versatile method for continuous improvements, especially when combined with natural language modeling [4][5] Talent & Culture - OpenAI emphasizes fundamental research and innovation, discouraging copying and fostering a culture where researchers are inspired to discover new things [35][36] - The company looks for individuals who have solved hard problems in any field, possessing strong technical fundamentals and the intent to work on ambitious challenges [40] - OpenAI protects fundamental research by delineating researchers focused on algorithmic advances from those focused on product, ensuring space for long-term research questions [46][57] Resource Allocation & Strategy - OpenAI prioritizes core algorithmic advances over product research in compute allocation, but remains flexible to adapt to changing needs [59] - The company believes compute remains a critical resource for advancing AI, not expecting to be data-constrained anytime soon [62][63] - OpenAI acts from a place of strong belief in its long-term research program, not tying it too closely to short-term product reception [70]