Anthropic 深夜祭出 Claude Sonnet 4.5，能自主连续工作 30 小时！CEO：它更像你的同事

Core Insights - Anthropic has launched its new AI model, Claude Sonnet 4.5, claiming it to be the best coding model and a powerful tool for building complex agents, capable of independent production-level development tasks [2][21] - The model shows significant improvements in software coding capabilities, achieving a 77.2% accuracy in SWE-bench Verified benchmark tests, which is nearly a 20 percentage point increase from its predecessor [4][9] - The release includes the Claude Agent SDK, which allows developers to create customized AI assistants, addressing key pain points in AI agent development [12][14] Performance Improvements - Claude Sonnet 4.5 has demonstrated a remarkable ability to autonomously run for 30 hours, generating 11,000 lines of code and completing a full enterprise chat application development process [4] - In the OSWorld benchmark, the model's score improved from 42.2% to 61.4% over four months, outperforming similar products in the industry [7][9] - The model has shown over 30% improvement in reasoning capabilities in specialized fields such as finance and law compared to the previous version, Opus 4.1 [7][9] Product Ecosystem Upgrades - The Claude Agent SDK enables developers to build tailored AI assistants for various applications, including project management and customer service [12][14] - Claude Code 2.0 introduces a highly requested "checkpoint" feature for code progress saving and instant rollback, enhancing development efficiency [13] - The API capabilities have been strengthened, extending the AI agent's operational time from 7 hours to 30 hours for more complex tasks [13] Safety and Security Enhancements - Claude Sonnet 4.5 has achieved AI Safety Level 3 (ASL-3) certification, significantly reducing the false positive rate by 90% compared to earlier models [16] - The model includes advanced detection for hazardous content and has made substantial progress in defending against immediate injection attacks, a major risk for users [16] Commercial Strategy - Anthropic maintains competitive pricing for API calls, consistent with Claude Sonnet 4, at $3 per million tokens for input and $15 for output [19] - The company positions Claude Sonnet 4.5 as the default choice for users, recommending it for nearly all use cases while still allowing access to older models for specific workflows [19][20] - Industry analysts note that the release signifies a shift from AI as an "assistive tool" to "independent productivity" [21]