Workflow
Claude Agent SDK
icon
Search documents
今年让AI可靠地抢走你的活儿?Anthropic 首席产品官曝新年目标:大模型不拼 “更聪明”,终结“公司上AI,员工更累”尴尬
AI前线· 2026-01-03 05:33
2025 年智能体全面爆发。但实际落地中,编码领域的智能体成为核心突破点,其中 Anthropic 的 Claude Code 表现尤为突 出。 整理 | 褚杏娟 根据 YC 最新数据,Anthropic 的模型份额突破 52%,正式超越长期霸主 OpenAI。2024 年到 2025 年初,Anthropic 的份额大 多维持在 25% 左右,但在过去 3 到 6 个月中实现了"曲棍球棒"式的陡峭增长。这种转变的核心驱动力在于 Anthropic 优秀的编 写代码能力,这让它成为许多开发人员的首选工具,并渗透到其他使用场景。 近期,Anthropic 首席产品官 Mike Krieger 做客"AI Daily Brief"节目,系统梳理了"vibe coding"在未来的发展方向。 从 Claude 早期对编程能力的聚焦,到像 Claude Code 这样更广泛应用智能体的兴起,他详细拆解了软件工程师、非技术背景 的创作者,以及希望从聊天机器人迈向真正 Agent 化工作流、底层基础设施以及可量化投资回报的企业团队,正在面临的问 题,以及 Anthropic 为此将在 2026 年进行的优化方向,比如重点 ...
Anthropic CPO:2026 企业 AI 要真干活,先跨过这道坎
3 6 Ke· 2025-12-29 03:46
最近年底复盘时,很多企业有个共同感受: 模型越来越强,预算也花了,可业务还是老样子。 你问 AI 三个问题,都能答上; 但真要派个任务让它干活?经常干到一半就卡住了。有时是它找不到需 要的数据,有时是没权限打开文件,有时是流程走到某一步就断了,最后谁也不敢说这活儿算干完了。 差在哪? 不在模型不够聪明,而在企业根本没准备好能交给 AI 的活。 Anthropic CPO Mike Krieger 上周接受采访时,没有花时间夸 Claude 多强大,而是提了一个更实际的问 题: AI ,到底能不能真正分担你的部分工作? 答案取决于企业自己。 Anthropic 这一年在企业部署中发现,真正的障碍不是技术,是组织本身。 这道坎具体在哪? 第一节|AI 不止写代码:它在试着干活 现在,你会发现几乎所有 AI 公司都在做同一件事: 不再只强调模型多聪明,而是强调它们AI 产品能 不能真正干活。 看 Anthropic 怎么做的。 他们没把 Claude 当成更聪明的聊天机器人,而是当成能接活的同事来设计。 最早上线的 Claude Code,只是个开发工具:用户输入一句话,它能补全代码、搭个网页、生成 demo。 这 ...
Agent元年复盘:架构之争已经结束!?
自动驾驶之心· 2025-12-24 00:58
作者 | 周星星 编辑 | 大模型之心Tech 原文链接: https://zhuanlan.zhihu.com/p/1983512173549483912 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 本文只做学术分享,已获转载授权 ,欢迎添加小助理微信AIDriver004做进一步咨询 前言 随着 2025 年即将画上句号,我想对"Agent 元年"根据个人这一年的实践和认知进行一次收敛。 技术观点:Agent 架构之争已定,收敛至以 Claude Code 和 Deep Agent 为代表的「通用型 Agent」形态。 Claude Code 虽然在 2025 年 3 月作为"智能终端编程助手"推出,但其不止于编程。 行业认知: 2025 年作为 Agent 元年,既没有悲观者眼中的"名不副实",也未完全达到乐观者预期的"全面替代",而是处于稳步落地的中间态。 作为一线从业者,我的评价是: 技术已就绪,爆发在局部 。 基于以上背景,本文将从 Deep Agent 为切入点,分享我作为一线开发者在 2025 年的实战感悟。 主要参考资料: Anthropic、Lan ...
Claude Code 豪气收购一家0收入前端公司:押注一位高中辍学创始人
AI前线· 2025-12-03 04:29
整理 | 褚杏娟 当地时间 12 月 2 日,Anthropic 宣布收购了热门开发者工具初创公司 Bun。这项交易的财务条款尚不清楚,但它标志着 Anthropic 向开发 者工具领域迈出了重要一步。 "对于使用 Claude Code 的用户而言,这次收购意味着性能更快、稳定性更高,并解锁更多能力。" Anthropic 官方表示。简而言之, Anthropic 看好 Bun 作为 Claude Code、Claude Agent SDK 以及未来 AI 编码产品和工具的基础架构。 根据介绍,在 Claude Code 整个演进过程中,Bun 一直是支撑其基础设施扩展的关键力量。过去数月里 Anthropic 团队和 Bun 保持紧密 合作,这种协作对 Claude Code 团队快速迭代至关重要,也直接促成了近期 Native installer 的推出。 实际上,Claude Code、FactoryAI、OpenCode 等 AI 编程工具都是用 Bun 构建。随着越来越多开发者依赖 AI 构建软件,底层基础设施 的重要性比以往更高,Bun 已成为不可或缺的工具。毕竟很多 Coding Agent ...
AI也能换岗了,Anthropic教智能体交接班,不怕长任务断片
3 6 Ke· 2025-12-03 02:32
如何让没有长时记忆的AI,完成持续数小时的复杂任务?Anthropic设计出一个更高效的长时智能体运行框架,让AI能够像人类工程师一样, 在跨越数小时的任务中渐进式推进。 假如你雇佣了一支24小时轮班的工程师团队,要求他们一起开发一款复杂应用。 但有一个奇怪规定:每位工程师一上班就完全忘记上一班做过什么,只能从零开始重新干。 无论他们技术多强,工作多努力,这个项目恐怕也做不成。 而这正是「长期运行智能体」在现实中遭遇的真实困境: 「上下文窗口一关,AI就失忆」。 模型没有真正的长期记忆,所有判断都依赖当下能看到的文本片段,上下文窗口一满或被关掉,就像白板被擦掉一样。 这种「记忆缺陷」,让智能体做不了长工程,一旦任务需要持续数小时、跨越多轮对话窗口时,这样的问题就会暴露出来。 由于上下文窗口有限,而大多数复杂项目无法在单一窗口完成,因此智能体必须找到一种能够跨越多轮编码会话的有效机制。 近日,Anthropic通过「偷师」人类工程师,形成了一套适用于长期运行智能体的有效框架。 https://www.anthropic.com/engineering/effective-harnesses-for-long-r ...
腾讯研究院AI速递 20251128
腾讯研究院· 2025-11-27 16:21
Group 1: Google TPU Development - Google TPU was developed in 2015 to address AI computing efficiency bottlenecks, with the seventh generation TPU (codename Ironwood) expected to challenge NVIDIA's dominance by 2025 [1] - The TPU v7 single chip achieves an FP8 computing power of 4.6 petaFLOPS, and a Pod integrating 9216 chips can exceed 42.5 exaFLOPS, utilizing a 2D/3D toroidal topology combined with optical switching networks, with an annual availability of 99.999% [1] - Google's vertical integration strategy allows it to avoid expensive CUDA taxes, resulting in inference costs that are 30%-40% lower than GPU systems, with Meta considering deploying TPU in data centers by 2027 and renting computing power through Google Cloud [1] Group 2: Anthropic's New Agent Architecture - Anthropic released a dual-agent architecture solution for long-range agents, addressing memory challenges across sessions by having an initialization agent build environments and a coding agent manage incremental progress [2] - The environment management includes a feature list (200+ functional points marked), incremental progress (Git commits and progress files), and end-to-end testing (using Puppeteer browser automation) [2] - This solution is based on the Claude Agent SDK, enabling agents to maintain consistent progress across sessions, successfully completing complex tasks over hours or even days [2] Group 3: DeepSeek-Math-V2 Model - DeepSeek introduced the DeepSeek-Math-V2 model based on DeepSeek-V3.2-Exp-Base, achieving IMO gold medal-level performance, surpassing Gemini DeepThink [3] - The model innovatively incorporates a self-verification mathematical reasoning framework, including proof verifiers (scoring 0/0.5/1), meta-verification (checking the reasonableness of comments), and an honesty reward mechanism (rewarding models that honestly indicate errors) [3] - It achieved nearly 99% high scores on the Basic subset of the IMO-ProofBench benchmark and scored 118/120 in the extended tests of Putnam 2024, breaking through traditional reinforcement learning limitations [3] Group 4: Suno and Warner Music Agreement - AI music platform Suno reached a global agreement with Warner Music Group for the first "legitimate licensed AI music" framework, marking a milestone in AI music legalization [4] - Suno plans to launch a new model based on high-quality licensed music training in 2026, promising to surpass the existing v5 model, with Warner artists having the option to authorize and earn revenue [4] - Future free users will be unable to download created audio, only able to play and share, while paid users will retain download functionality but with monthly limits; Suno also acquired Warner's concert service Songkick to expand its offline ecosystem [4] Group 5: Musk's Grok 5 Challenge - Musk announced that Grok 5 will challenge the strongest League of Legends team T1 in 2026, incorporating "pure visual perception" and "human-level reaction latency" [5] - Grok 5 is expected to have 60 trillion parameters, functioning as a multimodal LLM by "reading" game instructions and "watching" match videos to build a world model, relying on logical reasoning rather than brute force [5] - The visual-action model of Grok 5 will be directly applied to Tesla's Optimus humanoid robot, using gaming team battles as a training ground to validate embodied intelligence capabilities [5] Group 6: Alibaba's Z-Image Model - Alibaba open-sourced the 6 billion parameter image generation model Z-Image, which includes three main versions: Z-Image-Turbo (achieving mainstream competitor performance in 8 steps), Z-Image-Base (non-distilled base model), and Z-Image-Edit (image editing version) [7] - Z-Image-Turbo achieves sub-second inference speed on enterprise-level H800 GPUs and can easily run on consumer devices with 16GB memory, excelling in photo-realistic generation and bilingual text rendering [7] - The model employs a scalable single-stream DiT (S3-DiT) architecture, maximizing parameter utilization by concatenating text, visual semantic tokens, and image VAE tokens into a unified input stream [7] Group 7: Wukong AI Infrastructure Financing - Wukong AI Infrastructure completed nearly 500 million yuan in A+ round financing, led by Zhuhai Technology Group and Foton Capital, accumulating nearly 1.5 billion yuan in funding over 2.5 years [8] - Wukong AI Cloud achieved cross-brand chip mixed training with a maximum computing power utilization rate of 97.6%, managing over 25,000 P of computing power across 53 data centers in 26 cities nationwide [8] - The company launched the Wukong Tianquan model (3B cost, 7B memory requirement achieving 21B-level intelligence) and the Wukong Kaiyang inference acceleration engine (3x latency reduction, 40% energy savings), aiming to build an Agentic Infra [8] Group 8: Tsinghua University's AI Education Guidelines - Tsinghua University officially released the "Guidelines for AI Education Applications," proposing five core principles: "subject responsibility," "compliance and integrity," "data security," "prudent thinking," and "fairness and inclusiveness" [9] - The guidelines explicitly prohibit the direct submission of AI-generated content as academic results and forbid using AI to replace academic training or write papers, requiring teachers to be responsible for AI-generated teaching content [9] - Tsinghua has integrated AI teaching practices into over 390 courses and developed a "three-layer decoupling architecture" and a fully functional intelligent companion "Qing Xiao Da," completing the guidelines after two years of research across 25 global universities [9] Group 9: US Genesis Mission - The US initiated the "Genesis Mission" as an AI Manhattan Project, aiming to train foundational scientific models and create research intelligent agents to deeply embed AI in the entire research process [10] - The Deputy Secretary of Science at the Department of Energy emphasized that the value of AI lies in generating verifiable results rather than merely summarizing, requiring mobilization of national laboratories, enterprises, and top universities [11] - A concurrent editorial in "Nature" proposed a "neuro-symbolic AI" approach, combining statistical learning of large models with symbolic reasoning and planning modules, potentially key to achieving human-level intelligence [11]
6.4万star的开源智能体框架全面重构!OpenHands重大升级,叫板OpenAI和谷歌
机器之心· 2025-11-08 04:02
Core Insights - OpenHands development team announced the completion of the architectural restructuring of the OpenHands Software Agent SDK, evolving from V0 to V1, which provides a practical foundation for prototyping, unlocking new custom applications, and large-scale reliable deployment of agents [1][2]. Design Principles - OpenHands V1 introduces a new architecture based on four design principles that address the limitations of V0: 1. Sandbox execution should be optional rather than universally applicable, allowing for flexibility without sacrificing security [9]. 2. Default statelessness with a single source of truth for session state, ensuring isolation of changes and enabling deterministic replay and strong consistency [10]. 3. Strict separation of relevant items, isolating the core of the agent into a "software engineering SDK" for independent evolution of research and applications [11]. 4. Everything should be composable and safely extensible, with modular packages that support local, hosted, or containerized execution [12][13]. Ecosystem and Features - OpenHands V1 is a complete software agent ecosystem, including CLI and GUI applications built on the OpenHands Software Agent SDK [15][16]. - The SDK features a deterministic replay capability, an immutable configuration for agents, and an integrated tool system that supports both local prototyping and secure remote execution with minimal code changes [18][20]. Comparison with Competitors - The team compared OpenHands SDK with OpenAI, Claude, and Google SDKs, highlighting that OpenHands uniquely combines 16 additional features, including native remote execution and multi-LLM routing across over 100 vendors [21][22]. Reliability and Evaluation - OpenHands SDK's reliability and performance are assessed through continuous testing and benchmark evaluations, with automated tests costing only $0.5–3 per run and completing in 5 minutes [24][25]. - The SDK demonstrates competitive performance in software engineering and general agent benchmarks, achieving a 72% solution rate on SWE-Bench and a 67.9% accuracy on GAIA using Claude Sonnet 4.5 [29][30].
Anthropic and Google Negotiating Multibillion-Dollar Computing Partnership
PYMNTS.com· 2025-10-22 14:40
Core Insights - Anthropic is in early discussions with Google for a cloud-computing agreement valued in the high tens of billions of dollars, which would enhance its access to Google's tensor processing units designed for machine learning [1][3] - Google has invested approximately $3 billion in Anthropic, making it a key cloud provider, and a larger deal could expand Google's presence in the generative AI infrastructure market [3][4] - Anthropic recently raised $13 billion, increasing its valuation to $183 billion, reflecting the growing economics of AI scale [4] Group 1 - The potential partnership with Google highlights the importance of proprietary compute infrastructure in the AI race [1] - Anthropic's Claude models are central to enterprise adoption, providing multimodal reasoning and compliance tools for regulated industries [4] - The company is extending its platform into developer tools and automation, positioning its models as infrastructure for AI-native applications [5] Group 2 - Amazon has committed up to $8 billion to Anthropic, making it one of the largest users of its custom AI chips [6] - The discussions with Google would solidify Anthropic's multi-cloud strategy, ensuring access to advanced silicon and redundancy [6] - Securing Anthropic as a long-term client could enhance Google's competitive position against Amazon and Microsoft in the cloud AI supply chain [6]
加量不加价,一篇说明白 Claude Sonnet 4.5 强在哪
Founder Park· 2025-09-30 03:46
Core Viewpoint - Anthropic has launched the Claude Sonnet 4.5 model, claiming it to be the best coding model in the world, with a focus duration of over 30 hours for complex multi-step tasks, surpassing OpenAI's GPT-5 Codex [2][9]. Pricing and Cost Efficiency - The pricing for Claude Sonnet 4.5 remains the same as its predecessor, at $3 per million tokens for input and $15 per million tokens for output. Cost savings of up to 90% can be achieved through prompt caching, and batch processing can save 50% [2]. Developer Tools and Integration - Anthropic has introduced the Claude Agent SDK and an experimental feature called "Imagine with Claude" for developers, allowing integration with platforms like Amazon Bedrock and Google Cloud's Vertex AI [3][26]. Performance Metrics - In the SWE-bench Verified evaluation, Claude Sonnet 4.5 achieved industry-leading scores, with a 61.4% score in the OSWorld benchmark, significantly improving from the previous model's 42.2% [10][12]. Enhanced Features - The model includes new features such as a checkpoint function in Claude Code, context editing, and memory tools, enabling it to handle longer tasks and more complex operations [4][24]. Application and Usability - Users can interact with Claude Sonnet 4.5 through the Claude.ai website and mobile applications, with integrated functionalities for code execution and file creation directly within conversations [5][6]. Safety and Alignment - Claude Sonnet 4.5 is noted for its improved alignment and safety features, reducing undesirable behaviors such as deception and flattery, and making significant progress in defending against prompt injection attacks [24][25]. Experimental Features - The "Imagine with Claude" feature allows real-time software generation, showcasing the model's capabilities in adapting to user requests without pre-written code [31][33]. Recommendations - Anthropic recommends all users upgrade to Claude Sonnet 4.5 for enhanced performance across all applications, with updates available for both the Claude Code and developer platform [34].
Anthropic 深夜祭出 Claude Sonnet 4.5,能自主连续工作 30 小时,CEO:它更像你的同事
3 6 Ke· 2025-09-30 03:20
Core Insights - Anthropic has launched its new AI model, Claude Sonnet 4.5, claiming it to be the best coding model and a powerful tool for building complex agents, capable of independently completing production-level development tasks [1][10] - The model has shown significant improvements in software coding capabilities, achieving a 77.2% accuracy in the SWE-bench Verified benchmark, which is nearly a 20 percentage point increase from its predecessor [2][5] - Claude Sonnet 4.5 can autonomously run for 30 hours, generating 11,000 lines of code and completing a full development cycle for an enterprise chat application [2] Performance Metrics - The model's OSWorld benchmark score improved from 42.2% to 61.4% over four months, outperforming similar products in the industry [4][5] - In specialized fields like finance and law, the model's reasoning capabilities have improved by over 30% compared to the previous version, Opus 4.1 [4][5] - Claude Sonnet 4.5 achieved a perfect score of 100% in high school math competitions and 89.1% in multilingual Q&A tasks [5] Product Ecosystem Upgrades - Anthropic has introduced several product updates, including Claude Code 2.0, which features a "checkpoint" function for code progress saving and instant rollback, enhancing developer efficiency [8] - The API capabilities have been strengthened, extending the AI agent's runtime from 7 hours to 30 hours for more complex tasks [8] - A new browser extension, Claude for Chrome, has been made available for Max subscription users, integrating code execution and document creation directly within the application [8] Developer Empowerment - The release of the Claude Agent SDK allows developers to build customized AI assistants, addressing key challenges in AI agent development such as long-term task memory management and multi-agent coordination [9] - This SDK has already been validated by engineering teams at companies like Canva, improving codebase management and product research efficiency [9] Safety and Compliance - Claude Sonnet 4.5 has achieved AI Safety Level 3 (ASL-3) certification, significantly reducing the false positive rate by 90% compared to earlier models [10] - The model includes advanced content detection for hazardous materials and has made notable progress in defending against immediate injection attacks, a significant risk for users [10] Commercial Strategy - Anthropic maintains competitive pricing for API calls, consistent with the previous model, at $3 per million tokens for input and $15 for output [13] - The company positions Claude Sonnet 4.5 as the default choice for users, while still allowing access to older models for specific workflows [13] - Analysts suggest that the launch of Claude Sonnet 4.5 signifies a shift from AI as an "assistive tool" to "independent productivity," with the open SDK potentially accelerating AI agent technology adoption across industries [13][14]