Anthropic推出Claude Sonnet 4.5，号称 “全球最佳编码模型”

Core Insights - Anthropic has launched its latest AI model, Claude Sonnet 4.5, which is claimed to be the "best coding model in the world" based on industry benchmarks like SWE-bench Verified [1][4] - The new model shows significant improvements in code generation quality, code improvement identification, and instruction adherence compared to previous models [1][4] - Experts in finance, law, and medicine have noted enhanced knowledge and reasoning capabilities in Sonnet 4.5 over older models like Opus 4.1 [1] Performance and Features - Claude Sonnet 4.5 has achieved a score of 61.4% in the OSWorld benchmark, up from 42.2% four months ago, indicating a substantial performance leap [4] - The model is designed to autonomously run for up to 30 hours, significantly longer than the 7 hours of its predecessor [6] - Initial user feedback suggests that while the model outputs are generally better, it may occasionally miss key details [6] Safety and Alignment - The model is described as the most consistent version to date, with improvements in behavior and a reduction in concerning actions such as deception and power-seeking [7] - It has enhanced resistance to prompt injection attacks, which can lead to malicious operations [7] - Released under AI Safety Level 3 (ASL-3), it includes classifiers for detecting threats related to chemical, biological, radiological, and nuclear (CBRN) weapons [7] Product Updates - Alongside the new model, Anthropic introduced the Claude Agent SDK, aimed at helping developers build AI agents with improved memory management and autonomy [10] - Additional updates include a "checkpoint" feature for Claude Code, a new native extension for VS Code, and direct integration of code execution and file creation in paid applications [12] - The pricing for Sonnet 4.5 remains consistent with the previous generation, Sonnet 4, while paid subscribers can still opt for the older Opus model [3]