Workflow
Claude Opus 4.5发布!2小时工程测试超人类,前代Sonnet搞不定的活它轻松拿捏
量子位·2025-11-25 01:17

Core Insights - Claude Opus 4.5 has been released, showcasing significant advancements in coding, agent capabilities, and computer usage, outperforming all human candidates in a two-hour engineering task [1][16][10] Performance Metrics - In the SWE-bench Verified coding tests, Opus 4.5 achieved a score of 80.9%, surpassing Sonnet 4.5's 77.2% and Opus 4.1's 74.5% [2][19] - The model demonstrated a 10.6% improvement in high-difficulty coding challenges compared to Sonnet 4.5 [22] - In visual reasoning, Opus 4.5 scored 80.7%, outperforming Sonnet 4.5's 77.8% [19] Enhanced Capabilities - Opus 4.5 shows improved performance in deep research, PPT creation, and spreadsheet handling, with the ability to autonomously process complex scenarios and provide solutions without human guidance [6][14] - The model can efficiently manage multiple sub-agents, supporting the construction of complex multi-agent systems [38] Developer Platform Upgrades - The Claude API has introduced an "effort parameter," allowing developers to optimize for time and cost or maximize performance, resulting in a 76% reduction in token usage while maintaining high performance [32][36] - Claude Code has launched new features, including a Plan Mode for generating precise execution plans and the ability to run multiple sessions simultaneously [41][42] Accessibility and Usage - Opus 4.5 is available through apps, APIs, and major cloud platforms, with a pricing model of $5 per million tokens for input and $2.5 for output [12] - The usage limits for Max and Team Premium users have been increased, aligning Opus token usage with previous Sonnet models [43]