Claude Code 2.0

Search documents
Anthropic 深夜祭出 Claude Sonnet 4.5,能自主连续工作 30 小时,CEO:它更像你的同事
3 6 Ke· 2025-09-30 03:20
Core Insights - Anthropic has launched its new AI model, Claude Sonnet 4.5, claiming it to be the best coding model and a powerful tool for building complex agents, capable of independently completing production-level development tasks [1][10] - The model has shown significant improvements in software coding capabilities, achieving a 77.2% accuracy in the SWE-bench Verified benchmark, which is nearly a 20 percentage point increase from its predecessor [2][5] - Claude Sonnet 4.5 can autonomously run for 30 hours, generating 11,000 lines of code and completing a full development cycle for an enterprise chat application [2] Performance Metrics - The model's OSWorld benchmark score improved from 42.2% to 61.4% over four months, outperforming similar products in the industry [4][5] - In specialized fields like finance and law, the model's reasoning capabilities have improved by over 30% compared to the previous version, Opus 4.1 [4][5] - Claude Sonnet 4.5 achieved a perfect score of 100% in high school math competitions and 89.1% in multilingual Q&A tasks [5] Product Ecosystem Upgrades - Anthropic has introduced several product updates, including Claude Code 2.0, which features a "checkpoint" function for code progress saving and instant rollback, enhancing developer efficiency [8] - The API capabilities have been strengthened, extending the AI agent's runtime from 7 hours to 30 hours for more complex tasks [8] - A new browser extension, Claude for Chrome, has been made available for Max subscription users, integrating code execution and document creation directly within the application [8] Developer Empowerment - The release of the Claude Agent SDK allows developers to build customized AI assistants, addressing key challenges in AI agent development such as long-term task memory management and multi-agent coordination [9] - This SDK has already been validated by engineering teams at companies like Canva, improving codebase management and product research efficiency [9] Safety and Compliance - Claude Sonnet 4.5 has achieved AI Safety Level 3 (ASL-3) certification, significantly reducing the false positive rate by 90% compared to earlier models [10] - The model includes advanced content detection for hazardous materials and has made notable progress in defending against immediate injection attacks, a significant risk for users [10] Commercial Strategy - Anthropic maintains competitive pricing for API calls, consistent with the previous model, at $3 per million tokens for input and $15 for output [13] - The company positions Claude Sonnet 4.5 as the default choice for users, while still allowing access to older models for specific workflows [13] - Analysts suggest that the launch of Claude Sonnet 4.5 signifies a shift from AI as an "assistive tool" to "independent productivity," with the open SDK potentially accelerating AI agent technology adoption across industries [13][14]
Anthropic 深夜祭出 Claude Sonnet 4.5,能自主连续工作 30 小时!CEO:它更像你的同事
AI前线· 2025-09-30 01:18
Core Insights - Anthropic has launched its new AI model, Claude Sonnet 4.5, claiming it to be the best coding model and a powerful tool for building complex agents, capable of independent production-level development tasks [2][21] - The model shows significant improvements in software coding capabilities, achieving a 77.2% accuracy in SWE-bench Verified benchmark tests, which is nearly a 20 percentage point increase from its predecessor [4][9] - The release includes the Claude Agent SDK, which allows developers to create customized AI assistants, addressing key pain points in AI agent development [12][14] Performance Improvements - Claude Sonnet 4.5 has demonstrated a remarkable ability to autonomously run for 30 hours, generating 11,000 lines of code and completing a full enterprise chat application development process [4] - In the OSWorld benchmark, the model's score improved from 42.2% to 61.4% over four months, outperforming similar products in the industry [7][9] - The model has shown over 30% improvement in reasoning capabilities in specialized fields such as finance and law compared to the previous version, Opus 4.1 [7][9] Product Ecosystem Upgrades - The Claude Agent SDK enables developers to build tailored AI assistants for various applications, including project management and customer service [12][14] - Claude Code 2.0 introduces a highly requested "checkpoint" feature for code progress saving and instant rollback, enhancing development efficiency [13] - The API capabilities have been strengthened, extending the AI agent's operational time from 7 hours to 30 hours for more complex tasks [13] Safety and Security Enhancements - Claude Sonnet 4.5 has achieved AI Safety Level 3 (ASL-3) certification, significantly reducing the false positive rate by 90% compared to earlier models [16] - The model includes advanced detection for hazardous content and has made substantial progress in defending against immediate injection attacks, a major risk for users [16] Commercial Strategy - Anthropic maintains competitive pricing for API calls, consistent with Claude Sonnet 4, at $3 per million tokens for input and $15 for output [19] - The company positions Claude Sonnet 4.5 as the default choice for users, recommending it for nearly all use cases while still allowing access to older models for specific workflows [19][20] - Industry analysts note that the release signifies a shift from AI as an "assistive tool" to "independent productivity" [21]