Workflow
Claude Agent SDK
icon
Search documents
美股异动丨财捷盘前涨超6%,与Anthropic达成多年合作伙伴关系
Ge Long Hui· 2026-02-24 13:51
财捷(INTU.US)盘前涨超6%,报381.3美元。消息面上,财捷与Anthropic达成多年合作伙伴关系,旨在 为中端市场企业带来可定制的AI代理,并在Anthropic平台上扩展金融工具。此次合作将使企业能够在 财捷集团平台上使用Anthropic的Claude Agent SDK构建和定制安全、准确的AI代理,以支持合规工作流 程。(格隆汇) ...
懂了很多道理,AI 依然要发疯
3 6 Ke· 2026-02-09 06:50
最近一段时间,很多论文都在讨论Agent目前的困境。 困境是真实存在的。在应用层,目前Agent离开了像Skill这样人造拐棍后,在处理真实世界的长程任务时根本不可靠。 这种困境通常被归结为两个原因。 第一个是上下文的黑洞。正如前两天腾讯首席AI科学家姚顺雨带领混元团队做的CL Bench所指出的那样,模型或许根本没能力吃透复杂 上下文,所以也不可能按照指令好好办事。 第二个其实更致命,它叫长期规划的崩塌。就是说一旦规划的步长长了,模型就开始犯迷糊。就和喝多了一样,走两步是直的,走十步 就开始画圈。 Anthropic 的研究员们在1月末发布了一篇重磅论文《The Hot Mess of AI 》(AI 的一团乱麻),试图解释第二个问题的因由,结果他们发 现,这一试,给自回归模型(Transformer为基础的都是)清楚的找到了阿喀琉斯之踵。 我们都听说过Yann Lecun经常提的"自回归模型只做Next Token Prediction(下一个词预测),因此根本没法达到理解和AGI。" 但之前这都是个判断或者信仰,没有什么实证证据。这篇论文,就给出了一些实证证据。 而且它还预示了一个可怕的现实,即随着模型 ...
怎么做 Long-running Agents,Cursor、Anthropic 给了两种截然不同的思路
Founder Park· 2026-01-20 15:00
Core Viewpoint - The article discusses advancements in long-running AI agents, focusing on two approaches: Cursor's multi-agent parallel collaboration and Anthropic's memory continuity for single agents [4][27]. Group 1: Cursor's Approach - Cursor aims to execute complex, long-term tasks by running multiple agents in parallel, similar to human team collaboration [4][8]. - The initial attempts at coordination faced challenges, including inefficiencies due to locking mechanisms and a lack of accountability among agents [10][12]. - The introduction of role differentiation among agents—Planners, Workers, and Judges—improved project coordination and scalability [15][21]. - Successful experiments included building a web browser from scratch, generating over 1 million lines of code, and migrating a large codebase, demonstrating the effectiveness of the new structure [17][19]. Group 2: Anthropic's Approach - Anthropic focuses on maintaining memory continuity for agents across multiple work sessions, addressing the limitations of context windows [27][28]. - The dual-agent system consists of an Initializer Agent to set up the project environment and a Coding Agent to execute tasks incrementally [34][39]. - This method emphasizes structured task management and thorough testing, significantly improving the accuracy of functionality verification [42][46]. - Open questions remain regarding the potential for specialized agents in various domains beyond web development [53].
今年让AI可靠地抢走你的活儿?Anthropic 首席产品官曝新年目标:大模型不拼 “更聪明”,终结“公司上AI,员工更累”尴尬
AI前线· 2026-01-03 05:33
2025 年智能体全面爆发。但实际落地中,编码领域的智能体成为核心突破点,其中 Anthropic 的 Claude Code 表现尤为突 出。 整理 | 褚杏娟 根据 YC 最新数据,Anthropic 的模型份额突破 52%,正式超越长期霸主 OpenAI。2024 年到 2025 年初,Anthropic 的份额大 多维持在 25% 左右,但在过去 3 到 6 个月中实现了"曲棍球棒"式的陡峭增长。这种转变的核心驱动力在于 Anthropic 优秀的编 写代码能力,这让它成为许多开发人员的首选工具,并渗透到其他使用场景。 近期,Anthropic 首席产品官 Mike Krieger 做客"AI Daily Brief"节目,系统梳理了"vibe coding"在未来的发展方向。 从 Claude 早期对编程能力的聚焦,到像 Claude Code 这样更广泛应用智能体的兴起,他详细拆解了软件工程师、非技术背景 的创作者,以及希望从聊天机器人迈向真正 Agent 化工作流、底层基础设施以及可量化投资回报的企业团队,正在面临的问 题,以及 Anthropic 为此将在 2026 年进行的优化方向,比如重点 ...
Anthropic CPO:2026 企业 AI 要真干活,先跨过这道坎
3 6 Ke· 2025-12-29 03:46
最近年底复盘时,很多企业有个共同感受: 模型越来越强,预算也花了,可业务还是老样子。 你问 AI 三个问题,都能答上; 但真要派个任务让它干活?经常干到一半就卡住了。有时是它找不到需 要的数据,有时是没权限打开文件,有时是流程走到某一步就断了,最后谁也不敢说这活儿算干完了。 差在哪? 不在模型不够聪明,而在企业根本没准备好能交给 AI 的活。 Anthropic CPO Mike Krieger 上周接受采访时,没有花时间夸 Claude 多强大,而是提了一个更实际的问 题: AI ,到底能不能真正分担你的部分工作? 答案取决于企业自己。 Anthropic 这一年在企业部署中发现,真正的障碍不是技术,是组织本身。 这道坎具体在哪? 第一节|AI 不止写代码:它在试着干活 现在,你会发现几乎所有 AI 公司都在做同一件事: 不再只强调模型多聪明,而是强调它们AI 产品能 不能真正干活。 看 Anthropic 怎么做的。 他们没把 Claude 当成更聪明的聊天机器人,而是当成能接活的同事来设计。 最早上线的 Claude Code,只是个开发工具:用户输入一句话,它能补全代码、搭个网页、生成 demo。 这 ...
Agent元年复盘:架构之争已经结束!?
自动驾驶之心· 2025-12-24 00:58
Core Insights - The article discusses the evolution of "Agent" technology, highlighting the emergence of "Deep Agent" and "Claude Agent SDK" as leading architectures in the field [3][57]. - It emphasizes that 2025 marks a pivotal year for agents, where technology readiness is evident, but full replacement of traditional methods has not yet been achieved [5][6]. Technical Perspectives - The architecture of agents has converged towards a general form represented by Claude Code and Deep Agent, focusing on their capabilities beyond programming [3][4]. - The article notes that the core capabilities of Claude Code, such as planning and context management, are applicable to various tasks beyond coding, leading to its rebranding as Claude Agent SDK [9]. Industry Recognition - The article asserts that while agent products have generated significant revenue in sectors like recruitment and marketing, the impact is less visible domestically due to a concentration of business in overseas markets [10]. - It identifies a shift in focus from technical architecture to business restructuring, emphasizing the need for industry professionals to adapt traditional workflows to be agent-friendly [10]. Definition and Characteristics of Deep Agent - A "Deep Agent" is characterized by its industry-specific knowledge and long-running capabilities, ensuring stability and reliability in task execution [11][12]. - The article outlines that a Deep Agent must demonstrate high levels of specialization and the ability to perform complex, multi-step tasks without failure [12]. Skills and Context Management - The introduction of "Agent Skills" allows for a more dynamic and efficient way to integrate business knowledge into agents, enhancing their capabilities [22][30]. - The concept of progressive disclosure is highlighted as a key design principle, enabling agents to load information as needed rather than all at once, improving context management [32][34]. Planning and Task Management - Planning is identified as a crucial component for agents to execute long-term tasks effectively, with the ability to decompose tasks into manageable sub-tasks [47][50]. - The article discusses the importance of context isolation and parallel execution in sub-agents, which enhances efficiency and reduces context confusion [50]. System Prompt and File Management - The article emphasizes the significance of detailed system prompts in guiding agent behavior and ensuring effective task execution [52]. - A well-structured file system is proposed as a means to manage context and facilitate collaboration among agents, allowing for long-term memory and efficient information retrieval [53][56]. Conclusion on Agent Technology - The article concludes that the agent technology landscape has reached a point of convergence, with established architectures like Claude Agent SDK and Deep Agent leading the way [57][58]. - It suggests that the future of agent technology will involve further specialization and adaptation to specific business needs, leveraging the strengths of existing frameworks [69][71].
Claude Code 豪气收购一家0收入前端公司:押注一位高中辍学创始人
AI前线· 2025-12-03 04:29
Core Insights - Anthropic announced the acquisition of Bun, a developer tool startup, marking a significant step into the developer tools sector [2] - The acquisition aims to enhance the performance and stability of Claude Code and other AI coding products, leveraging Bun's infrastructure [2][4] - Bun has become an essential tool for AI programming tools, addressing efficiency issues in agent distribution and execution [3] Summary by Sections Acquisition Details - The financial terms of the acquisition are undisclosed, but it aligns with Anthropic's strategy of seeking acquisitions that enhance technological capabilities and reinforce its leadership in enterprise AI [4] - Bun's integration is expected to accelerate the development of Claude Code and related tools, with a focus on maintaining high performance and lightweight solutions [15] Bun's Impact and Growth - Bun's monthly downloads exceed 7 million, with over 82,000 stars on GitHub, indicating its popularity among developers [4] - The tool has been adopted by companies like Midjourney and Lovable to improve development speed and efficiency [4] - Bun's single-file executables facilitate the distribution of CLI tools, making it a preferred choice for many coding agents [3] Future Prospects - The acquisition is seen as a way to provide long-term stability for Bun, allowing it to focus on building the best JavaScript tools without the pressure of immediate monetization [12][15] - Bun's roadmap will continue to emphasize high-performance JavaScript toolchains and Node.js compatibility, aiming to replace Node.js as the default server-side JavaScript runtime [17] - The integration with Anthropic is expected to enhance Bun's capabilities and speed of iteration, benefiting existing users [15] Community and Open Source Commitment - Bun will remain open-source under the MIT license, with the original team continuing to develop the tool [17] - The commitment to maintaining an active development community and transparency in the development process is emphasized [17]
AI也能换岗了,Anthropic教智能体交接班,不怕长任务断片
3 6 Ke· 2025-12-03 02:32
如何让没有长时记忆的AI,完成持续数小时的复杂任务?Anthropic设计出一个更高效的长时智能体运行框架,让AI能够像人类工程师一样, 在跨越数小时的任务中渐进式推进。 假如你雇佣了一支24小时轮班的工程师团队,要求他们一起开发一款复杂应用。 但有一个奇怪规定:每位工程师一上班就完全忘记上一班做过什么,只能从零开始重新干。 无论他们技术多强,工作多努力,这个项目恐怕也做不成。 而这正是「长期运行智能体」在现实中遭遇的真实困境: 「上下文窗口一关,AI就失忆」。 模型没有真正的长期记忆,所有判断都依赖当下能看到的文本片段,上下文窗口一满或被关掉,就像白板被擦掉一样。 这种「记忆缺陷」,让智能体做不了长工程,一旦任务需要持续数小时、跨越多轮对话窗口时,这样的问题就会暴露出来。 由于上下文窗口有限,而大多数复杂项目无法在单一窗口完成,因此智能体必须找到一种能够跨越多轮编码会话的有效机制。 近日,Anthropic通过「偷师」人类工程师,形成了一套适用于长期运行智能体的有效框架。 https://www.anthropic.com/engineering/effective-harnesses-for-long-r ...
腾讯研究院AI速递 20251128
腾讯研究院· 2025-11-27 16:21
Group 1: Google TPU Development - Google TPU was developed in 2015 to address AI computing efficiency bottlenecks, with the seventh generation TPU (codename Ironwood) expected to challenge NVIDIA's dominance by 2025 [1] - The TPU v7 single chip achieves an FP8 computing power of 4.6 petaFLOPS, and a Pod integrating 9216 chips can exceed 42.5 exaFLOPS, utilizing a 2D/3D toroidal topology combined with optical switching networks, with an annual availability of 99.999% [1] - Google's vertical integration strategy allows it to avoid expensive CUDA taxes, resulting in inference costs that are 30%-40% lower than GPU systems, with Meta considering deploying TPU in data centers by 2027 and renting computing power through Google Cloud [1] Group 2: Anthropic's New Agent Architecture - Anthropic released a dual-agent architecture solution for long-range agents, addressing memory challenges across sessions by having an initialization agent build environments and a coding agent manage incremental progress [2] - The environment management includes a feature list (200+ functional points marked), incremental progress (Git commits and progress files), and end-to-end testing (using Puppeteer browser automation) [2] - This solution is based on the Claude Agent SDK, enabling agents to maintain consistent progress across sessions, successfully completing complex tasks over hours or even days [2] Group 3: DeepSeek-Math-V2 Model - DeepSeek introduced the DeepSeek-Math-V2 model based on DeepSeek-V3.2-Exp-Base, achieving IMO gold medal-level performance, surpassing Gemini DeepThink [3] - The model innovatively incorporates a self-verification mathematical reasoning framework, including proof verifiers (scoring 0/0.5/1), meta-verification (checking the reasonableness of comments), and an honesty reward mechanism (rewarding models that honestly indicate errors) [3] - It achieved nearly 99% high scores on the Basic subset of the IMO-ProofBench benchmark and scored 118/120 in the extended tests of Putnam 2024, breaking through traditional reinforcement learning limitations [3] Group 4: Suno and Warner Music Agreement - AI music platform Suno reached a global agreement with Warner Music Group for the first "legitimate licensed AI music" framework, marking a milestone in AI music legalization [4] - Suno plans to launch a new model based on high-quality licensed music training in 2026, promising to surpass the existing v5 model, with Warner artists having the option to authorize and earn revenue [4] - Future free users will be unable to download created audio, only able to play and share, while paid users will retain download functionality but with monthly limits; Suno also acquired Warner's concert service Songkick to expand its offline ecosystem [4] Group 5: Musk's Grok 5 Challenge - Musk announced that Grok 5 will challenge the strongest League of Legends team T1 in 2026, incorporating "pure visual perception" and "human-level reaction latency" [5] - Grok 5 is expected to have 60 trillion parameters, functioning as a multimodal LLM by "reading" game instructions and "watching" match videos to build a world model, relying on logical reasoning rather than brute force [5] - The visual-action model of Grok 5 will be directly applied to Tesla's Optimus humanoid robot, using gaming team battles as a training ground to validate embodied intelligence capabilities [5] Group 6: Alibaba's Z-Image Model - Alibaba open-sourced the 6 billion parameter image generation model Z-Image, which includes three main versions: Z-Image-Turbo (achieving mainstream competitor performance in 8 steps), Z-Image-Base (non-distilled base model), and Z-Image-Edit (image editing version) [7] - Z-Image-Turbo achieves sub-second inference speed on enterprise-level H800 GPUs and can easily run on consumer devices with 16GB memory, excelling in photo-realistic generation and bilingual text rendering [7] - The model employs a scalable single-stream DiT (S3-DiT) architecture, maximizing parameter utilization by concatenating text, visual semantic tokens, and image VAE tokens into a unified input stream [7] Group 7: Wukong AI Infrastructure Financing - Wukong AI Infrastructure completed nearly 500 million yuan in A+ round financing, led by Zhuhai Technology Group and Foton Capital, accumulating nearly 1.5 billion yuan in funding over 2.5 years [8] - Wukong AI Cloud achieved cross-brand chip mixed training with a maximum computing power utilization rate of 97.6%, managing over 25,000 P of computing power across 53 data centers in 26 cities nationwide [8] - The company launched the Wukong Tianquan model (3B cost, 7B memory requirement achieving 21B-level intelligence) and the Wukong Kaiyang inference acceleration engine (3x latency reduction, 40% energy savings), aiming to build an Agentic Infra [8] Group 8: Tsinghua University's AI Education Guidelines - Tsinghua University officially released the "Guidelines for AI Education Applications," proposing five core principles: "subject responsibility," "compliance and integrity," "data security," "prudent thinking," and "fairness and inclusiveness" [9] - The guidelines explicitly prohibit the direct submission of AI-generated content as academic results and forbid using AI to replace academic training or write papers, requiring teachers to be responsible for AI-generated teaching content [9] - Tsinghua has integrated AI teaching practices into over 390 courses and developed a "three-layer decoupling architecture" and a fully functional intelligent companion "Qing Xiao Da," completing the guidelines after two years of research across 25 global universities [9] Group 9: US Genesis Mission - The US initiated the "Genesis Mission" as an AI Manhattan Project, aiming to train foundational scientific models and create research intelligent agents to deeply embed AI in the entire research process [10] - The Deputy Secretary of Science at the Department of Energy emphasized that the value of AI lies in generating verifiable results rather than merely summarizing, requiring mobilization of national laboratories, enterprises, and top universities [11] - A concurrent editorial in "Nature" proposed a "neuro-symbolic AI" approach, combining statistical learning of large models with symbolic reasoning and planning modules, potentially key to achieving human-level intelligence [11]
6.4万star的开源智能体框架全面重构!OpenHands重大升级,叫板OpenAI和谷歌
机器之心· 2025-11-08 04:02
Core Insights - OpenHands development team announced the completion of the architectural restructuring of the OpenHands Software Agent SDK, evolving from V0 to V1, which provides a practical foundation for prototyping, unlocking new custom applications, and large-scale reliable deployment of agents [1][2]. Design Principles - OpenHands V1 introduces a new architecture based on four design principles that address the limitations of V0: 1. Sandbox execution should be optional rather than universally applicable, allowing for flexibility without sacrificing security [9]. 2. Default statelessness with a single source of truth for session state, ensuring isolation of changes and enabling deterministic replay and strong consistency [10]. 3. Strict separation of relevant items, isolating the core of the agent into a "software engineering SDK" for independent evolution of research and applications [11]. 4. Everything should be composable and safely extensible, with modular packages that support local, hosted, or containerized execution [12][13]. Ecosystem and Features - OpenHands V1 is a complete software agent ecosystem, including CLI and GUI applications built on the OpenHands Software Agent SDK [15][16]. - The SDK features a deterministic replay capability, an immutable configuration for agents, and an integrated tool system that supports both local prototyping and secure remote execution with minimal code changes [18][20]. Comparison with Competitors - The team compared OpenHands SDK with OpenAI, Claude, and Google SDKs, highlighting that OpenHands uniquely combines 16 additional features, including native remote execution and multi-LLM routing across over 100 vendors [21][22]. Reliability and Evaluation - OpenHands SDK's reliability and performance are assessed through continuous testing and benchmark evaluations, with automated tests costing only $0.5–3 per run and completing in 5 minutes [24][25]. - The SDK demonstrates competitive performance in software engineering and general agent benchmarks, achieving a 72% solution rate on SWE-Bench and a 67.9% accuracy on GAIA using Claude Sonnet 4.5 [29][30].