Workflow
Claude Opus
icon
Search documents
别再乱试了!Redis 之父力荐:写代码、查 bug,这 2 个大模型封神!
程序员的那些事· 2025-07-21 06:50
就在 5 月 30 日凌晨, Redis 之父 antirez 写了一篇文章,他认为「人类程序员仍比 LLM 更胜一筹」 。 7 月 20 日,他又写了一篇文章,分享了对 LLM 编程的最近看法。 一年半前,我写过一篇题为《2024 年初的 LLM 与编程》的博客。当时,我就发现 LLM 已经很有用了,但在 这一年半里,它们的进步彻底改变了整个局面。然而,要充分利用它们的能力,与 LLM 交互的人类必须具备 某些特质并遵循特定的做法。下面我们就来探讨这些内容。 多数情况下拒绝"氛围编程" 以下是原文翻译: 2025 年夏天,与 LLM 并肩编程(最新进展) 像 Gemini 2.5 PRO 这类前沿的大语言模型(LLM),不仅对众多领域有着广泛的理解,还能在几秒内掌握数 千行代码,它们能够拓展并增强程序员的能力。只要你能清晰地描述问题,并且愿意在与 LLM 协作过程中进 行必要的反复沟通,就能取得惊人的成果,例如: 1、在代码触及任何用户之前消除你引入的 bug:我在 Redis 的 Vector Sets 实现中就有这样的经历。最终我 肯定能消除所有 bug,但很多 bug 通过 Gemini/Claude ...
AI 对齐了人的价值观,也学会了欺骗丨晚点周末
晚点LatePost· 2025-07-20 12:00
文 丨 曾梦龙 今年 5 月,有研究者发现 OpenAI 的模型 o3 拒绝听从人的指令,不愿意关闭自己,甚至通过篡改代 码避免自动关闭。类似事件还有,当测试人员暗示将用新系统替换 Claude Opus 4 模型时,模型竟然 主动威胁程序员,说如果你换掉我,我就把你的个人隐私放在网上,以阻止自己被替代。 "模型比人类厉害后,凭什么听话?现在我们有越来越多的研究,开始发现模型有欺骗的现象,非常普 遍。" 杨耀东 6 月通过视频向《晚点 LatePost》介绍 AI 欺骗时说。 学者杨耀东谈人机对齐背后的多重博弈。 杨耀东是北京大学人工智能研究院助理教授、人工智能安全与治理中心执行主任,也是具身智能公司 灵初智能(PsiBot)的联合实验室首席科学家。从本科起,他就从事 AI 研究,博士毕业于伦敦大学学 院(UCL)。 除了欺骗,AI "谄媚""偷懒""说谎" 等现象不断涌现,AI 似乎已经走上与人类的博弈之路。 有两种描述人类与 AI 关系的常见论调,一种是 "人迟早会被 AI 替代,最终走向灭绝",另一种是 "打 不过就加入,人要赶快学习 AI,用它提效、赚钱"。但这两种说法都忽视了一个基本事实:AI 的强 ...
X @Elon Musk
Elon Musk· 2025-07-18 18:10
RT Tetsuo (@tetsuoai)People keep asking if Grok 4 Heavy is better than Claude Opus. Opus is not even close, I canceled my Claude subscription. ...
DeepSeek终于丢了开源第一王座,但继任者依然来自中国
量子位· 2025-07-18 08:36
| | Rank (UB) ↑ Model 14 | | Score fJ | | 95% Cl (±) 1J Votes 1 | Organization 14 | License ?! | | --- | --- | --- | --- | --- | --- | --- | --- | | | 1 | G gemini-2.5-pro | 1462 | +41-5 | 19,209 | Google | Proprietary | | | 2 | 03-2025-04-16 | 1452 | +3/-4 | 25,442 | OpenAl | Proprietary | | | 3 | S chatgpt-4o-latest-20250326 | 1443 | +3/-3 | 26,230 | OpenAl | Proprietary | | | 3 | S gpt-4.5-preview-2025-02-27 | 1437 | +4/-5 | 15,271 | OpenAl | Proprietary | | | 3 | X grok-4-0709 | 1437 | +6/-7 | 5,725 | X ...
AI模型持续突破,股掌柜证券咨询前瞻科技主线投资机遇
在此趋势下,股掌柜证券投资咨询有限公司通过系统梳理AI产业链相关标的,从算法支持、应用生 态、智能终端到算力基础设施,构建出一套具有前瞻性的科技主线配置图谱,帮助投资者更精准识别产 业链受益环节。结合最新政策导向与资本流向变化,公司研究团队建议关注AI大模型技术突破与商业 化进程相对领先的企业,以及有望率先实现产品变现的重点应用场景。 科技周期的浪潮从不等人。AI大模型在"可用性"与"创造力"层面同步跃升,正带动投资逻辑从底层推理 走向场景落地。对于希望把握新质生产力脉络的投资者而言,持续关注AI产业链的结构演进与估值拐 点,将成为构建长期稳健组合的重要思路。未来,股掌柜将持续为投资者提供专业洞察与动态追踪,助 力把握前沿科技主线中的长期价值。 人工智能领域再掀浪潮。近期,美国大模型独角兽Anthropic发布新一代Claude Opus 4与Claude Sonnet 4,再次刷新行业技术上限。尤其是Opus 4,被誉为"世界上最好的编程模型",在智能体任务中展现出 稳定高效的表现。与此同时,谷歌在I/O开发者大会重磅推出AI影视制作平台"Flow",整合Veo、Imagen 和Gemini三大模型,实现音 ...
99%的程序员都将失业吗?
虎嗅APP· 2025-07-14 23:49
Core Viewpoint - The article discusses the transformative impact of AI on programming, suggesting that traditional coding roles may diminish as AI takes over code generation, leading to a shift in the role of programmers from code writers to problem solvers and system designers [3][28][32]. Group 1: AI Programming Trends - AI programming is identified as one of the most disruptive fields within large models, with predictions that AI will write 90% of code within 3 to 6 months and potentially 99% by the end of 2025 [5][6]. - The employment rate for computer programmers in the U.S. has dropped to its lowest level since 1980, indicating a significant reduction in job opportunities in this field [6]. - Major companies like Microsoft and Meta report that a substantial portion of their code is now generated by AI, with Microsoft stating that 30% of its code is AI-written and Meta expecting to reach 50% soon [8]. Group 2: Market Potential and Players - The global AI coding market is projected to exceed $20 billion in eight years, with significant potential in the Chinese market, where over 38,000 software and IT companies generated a total software revenue of 12.3 trillion yuan [10]. - Notable players in the AI programming space include Cursor, GitHub Copilot, and Tencent Cloud Code Assistant, with Cursor recently raising $900 million and achieving a valuation of $9 billion [12]. Group 3: Evolution of Programming Roles - The role of programmers is evolving from manual coding to overseeing AI-driven processes, with a focus on task allocation and code review rather than writing code [16][28]. - The emergence of "vibe coding" allows users to generate code through natural language prompts, reducing the need for extensive programming knowledge [13]. Group 4: Future of Programming - The article posits that while traditional programming roles may decline, the demand for skilled problem solvers who can define and optimize systems will increase, leading to a new era where "everyone can be a programmer" [28][32]. - The democratization of programming will enable individuals to create customized software solutions based on their needs, facilitated by AI tools that simplify the coding process [29][32].
国泰海通:Grok-4引领AI进阶 云服务商和数据中心运营商将直接受益
智通财经网· 2025-07-13 22:38
Core Insights - xAI's Grok 4 has achieved a significant breakthrough in reasoning and computational capabilities, surpassing previous models by over ten times in pre-training and reasoning computation, with a training scale reaching a hundred times that of Grok-2 [2] - Despite its advancements, Grok 4 still has notable shortcomings in multimodal capabilities, particularly in image understanding and generation, which require substantial improvement to reach human-level audiovisual perception and interaction [1][4] Group 1: Performance and Capabilities - Grok 4 has demonstrated revolutionary progress in solving real-world problems, with voice functionality achieving double the response speed and halved latency, significantly enhancing user experience compared to competitors [3] - In the Vending-Bench test, Grok 4 generated a net asset value of 4694.15, more than double that of the second-place Claude Opus 4, validating its long-term strategic execution capabilities [3] - Grok 4's performance in the human-level evaluation (HLE) reached 45%, twice that of the previous leading AI, Gemini 2.5pro, showcasing its superior academic capabilities [2] Group 2: Future Developments - The next generation of Grok will focus on breakthroughs in video generation technology, aiming to create an AI video creation closed loop through end-to-end training on pixel input-output [4] - Plans are in place to launch a 3D resource auto-generation system integrated with Unreal Engine next year, aimed at empowering the gaming and film industries [4] - The ultimate goal is to develop a super-intelligent entity that combines deep thinking, real-time response, and multimodal collaboration, fundamentally reshaping human-machine collaboration paradigms [4]
X @Anthropic
Anthropic· 2025-07-08 22:11
Our new study found that only 5 of 25 models showed higher compliance in the “training” scenario. Of those, only Claude Opus 3 and Sonnet 3.5 showed >1% alignment-faking reasoning.We explore why these models behave differently, and why most models don't show alignment faking. https://t.co/24K0iNxDpQ ...
AI编程工具 Cursor 定价调整引用户不满,CEO公开致歉并承诺退款
Sou Hu Cai Jing· 2025-07-08 07:41
Core Viewpoint - Anysphere's AI programming tool Cursor faced user backlash due to a pricing adjustment, leading to CEO Michael Truell's public apology and a commitment to refund affected users [1][4] Pricing Adjustment - On June 16, Cursor changed its Pro plan from a flat monthly fee of $20 for 500 fast replies to a usage-based model, charging users based on API rates after reaching the $20 limit [3] - Users expressed dissatisfaction on social media, particularly regarding the rapid depletion of usage limits when using advanced AI models like Anthropic's Claude [3] Communication Issues - Truell acknowledged significant communication failures regarding the pricing changes, admitting that the lack of clarity surprised many users [4] - The company plans to improve communication about pricing changes in the future [4] Cost Factors - The pricing adjustment was driven by rising costs associated with advanced AI models, which require more tokens per request due to their complexity [4] - Despite some AI model prices decreasing, the most advanced models remain expensive, with Anthropic's Claude Opus 4 charging $15 per million input tokens and $75 per million output tokens [4] Industry Trends - OpenAI and Anthropic have begun charging enterprise clients additional "priority access" fees, contributing to rising costs in the AI programming tools sector [5] - Cursor, as a leading AI product, generates over $500 million in annual recurring revenue (ARR) but faces intense competition from both AI model providers and other programming tools [5] Competitive Landscape - Anthropic's new AI programming tool Claude Code has gained popularity among enterprise users, increasing its ARR to $4 billion, potentially impacting Cursor's user base [5] - To maintain its market position, Cursor has signed long-term agreements with major AI model providers and introduced a new $200 monthly plan, Cursor Ultra, offering higher usage limits [6]
Claude Code发布4个月,用户已经11.5万了,开发者:200 美元/月不算贵
机器之心· 2025-07-07 09:30
机器之心报道 编辑:张倩 在「写代码」这件事上,大模型是真的在提高生产力,开发者也愿意花钱买时间。 都说「写代码」是当前 AI 大模型最有希望的应用,事实果真如此吗? Menlo Ventures 风险投资家 Deedy Das 据此推断,仅靠 Claude Code 这个产品,Anthropic 的年收入就可能达到 1.3 亿美元。 按照这个算法,每个开发者平均每年将向 Clade Code 贡献超过 1000 美元。这比很多个人订阅服务都高得多,意味着用户群体中存在大量高价值、高粘性的付费用 户。 当然,这个推断基于一系列假设,包括「每行代码大约产生 15 个 token」「 纯代码输出只占总输出 token 的 25%」「 输入 token 的量大约是输出 token 的 10 倍」 「模型使用量中,50% 是 Sonnet 模型,50% 是 Opus 模型 」「 11.5 万名开发者中有 5% 订阅了 Max 计划 」等,所以实际结果可能存在一定偏差。 此外,「1.95 亿行代码」这个数字也需要谨慎解读,因为单行代码更改可能需要多次迭代和修正才能达到质量要求。 根据 Anthropic 最近公布的一项 ...