智能体编程

Search documents
连续干7小时“不累”,OpenAI最强编程模型GPT-5-Codex来了
3 6 Ke· 2025-09-16 02:07
智东西9月16日消息,今日凌晨,OpenAI发布新模型GPT-5-Codex,这是其在GPT-5基础上专门为软件工程优化的模型版本,进一步提升了Codex中的智能 体编程(Agentic Coding)能力。 01 根据任务动态调整思考时间,错误评论减少、高影响力评论增加 GPT-5-Codex针对复杂的实际工程任务进行了训练,例如从头构建完整项目、添加功能和测试、调试、执行大规模重构以及进行代码审查。其可以更好遵 循AGENTS.md的指令,并生成高质量的代码,开发者只需提出自己的需求,无需编写冗长的代码风格或代码整洁性说明。 此外GPT‑5-Codex会根据任务的复杂程度,动态调整思考时间,其执行任务的时间会从几秒到7个小时不等。该模型结合了编程智能体的两项基本技能: 在交互式会话中与开发者配对,以及在较长的任务上持续、独立地执行。这意味着Codex在处理小型、定义明确的请求或与它聊天时会感觉更敏捷,并且 在处理大型重构等复杂任务时也能工作更长时间。 OpenAI在博客中提到,GPT-5-Codex的训练侧重于实际的软件工程工作,其可以根据任务动态调整思考时间,在大型复杂任务上能够独立工作超过7个小 时。 ...
Claude Code凭什么牛?大模型团队天天用自家产品,发现bug直接就改了
3 6 Ke· 2025-09-04 08:16
怎么判断模型、产品性能是否真的提升了?很简单,亲自用它实打实工作一天就知道了。 最近,Anthropic 官宣了一轮 130 亿美元的融资,公司估值达到 1830 亿美元,融资额仅次于 2025 年 3 月 OpenAI 历史性的 400 亿美元融资。 与此同时,这家也在经历新的考验:不少用户发现其王牌产品 ——Claude Code 存在降智问题,还有些开发者已经转向 OpenAI 推出的竞品 ——Codex Cli。 如果不考虑近期这些争议,其实 Claude Code 是一款非常成功的产品,它从 Cursor 那里抢走了大量用户,发布 4 个月用户就已经达到 11.5 万。 这个产品为什么可以取得成功?在最近的一次访谈中,Claude Code 负责人 Boris Cherny 透露了他们构建该产品的时的一些细节,包括极简易 用、高度可扩展的产品理念,真实体感大于 benchmark 的评估标准,极致的用户反馈响应机制等。 视频链接:https://www.youtube.com/watch?v=iF9iV4xponk 以下是详细内容: 1、过去的 12 个月里,编程领域发生了哪些变化? 一年前,如果你 ...
Claude Code凭什么牛?大模型团队天天用自家产品,发现bug直接就改了
机器之心· 2025-09-04 07:04
Core Insights - Anthropic recently announced a $13 billion funding round, bringing its valuation to $183 billion, second only to OpenAI's historic $40 billion funding in March 2025 [1] - Despite some user complaints regarding its flagship product, Claude Code, which has been reported to have "dumbing down" issues, the product has successfully captured a significant user base, reaching 115,000 users within four months of launch [3] Group 1: Product Performance and User Experience - Claude Code is designed with a philosophy of simplicity and high scalability, focusing on real user experience over benchmark evaluations [3] - The transition in programming workflows has shifted from manual coding and copy-pasting to a more hands-off approach where developers instruct agents to execute code modifications [6][7] - The evolution of models and tools, particularly Claude Code, has significantly improved programming capabilities, allowing for better integration of context management and tool usage [9] Group 2: Feedback and Iteration - Rapid feedback response is crucial for product improvement, with the team actively addressing bugs and user suggestions to foster a continuous feedback loop [15][17] - The internal feedback mechanism for Claude Code remains highly active, contributing to the product's rapid iteration and enhancement [17] Group 3: Future Developments and User Adaptation - The next 6 to 12 months will see a deeper integration of manual and automated programming, with Claude Code evolving to handle more complex project management tasks [20][21] - Developers are encouraged to adapt to these changes by focusing on core programming skills while also embracing creativity and innovation in project development [23] - New users are advised to first understand existing codebases with Claude Code before attempting to generate new code, emphasizing a strategic approach to task complexity [24][29]
刚刚,阿里最强编程模型开源,4800亿参数,Agent分数碾Kimi K2,训练细节公开
3 6 Ke· 2025-07-22 23:53
Core Insights - Alibaba's Qwen team has released its latest flagship programming model, Qwen3-Coder-480B-A35B-Instruct, which is claimed to be the most powerful open-source programming model to date, featuring 480 billion parameters and supporting up to 1 million tokens in context [1][2][16] - The model has achieved state-of-the-art performance in various programming and agent tasks, surpassing other open-source models and even competing with proprietary models like GPT-4.1 [1][3][20] - Qwen3-Coder is designed to significantly enhance productivity, allowing novice programmers to accomplish tasks in a fraction of the time it would take experienced developers [2][24] Model Specifications - Qwen3-Coder offers multiple sizes, with the current release being the most powerful variant at 480 billion parameters, which is greater than Alibaba's previous flagship model Qwen3 at 235 billion parameters but less than Kimi K2 at 1 trillion parameters [2][3] - The model supports a native context of 256K tokens and can be extended to 1 million tokens, optimized for programming tasks [16][20] Performance Metrics - In benchmark tests, Qwen3-Coder has outperformed other models in categories such as Agentic Coding, Agentic Browser Use, and Agentic Tool Use, achieving the best performance among open-source models [1][3][20] - Specific performance metrics include scores in various benchmarks, such as 69.6 in SWE-bench Verified and 77.5 in TAU-Bench Retail, showcasing its capabilities in real-world programming tasks [3][20] Pricing Structure - The API for Qwen3-Coder is available on Alibaba Cloud's platform with a tiered pricing model based on input token volume, ranging from $1 to $6 per million tokens for input and $5 to $60 for output, depending on the token range [4][5][24] - The pricing is competitive compared to other models like Claude Sonnet 4, which has lower input and output costs [4][5] User Experience and Applications - Qwen3-Coder has been made available for free on the Qwen Chat web platform, allowing users to experience its capabilities firsthand [6][24] - Users have reported impressive results in various tasks, including game development and UI design, with the model demonstrating high completion rates and aesthetic quality [9][11][12] Future Developments - The Qwen team is actively working on enhancing the model's performance and exploring self-improvement capabilities for coding agents [24] - More model sizes are expected to be released, aiming to balance deployment costs and performance [24]