Sonnet 4.5
Search documents
Claude Opus 4.5发布!2小时工程测试超人类,前代Sonnet搞不定的活它轻松拿捏
量子位· 2025-11-25 01:17
Core Insights - Claude Opus 4.5 has been released, showcasing significant advancements in coding, agent capabilities, and computer usage, outperforming all human candidates in a two-hour engineering task [1][16][10] Performance Metrics - In the SWE-bench Verified coding tests, Opus 4.5 achieved a score of 80.9%, surpassing Sonnet 4.5's 77.2% and Opus 4.1's 74.5% [2][19] - The model demonstrated a 10.6% improvement in high-difficulty coding challenges compared to Sonnet 4.5 [22] - In visual reasoning, Opus 4.5 scored 80.7%, outperforming Sonnet 4.5's 77.8% [19] Enhanced Capabilities - Opus 4.5 shows improved performance in deep research, PPT creation, and spreadsheet handling, with the ability to autonomously process complex scenarios and provide solutions without human guidance [6][14] - The model can efficiently manage multiple sub-agents, supporting the construction of complex multi-agent systems [38] Developer Platform Upgrades - The Claude API has introduced an "effort parameter," allowing developers to optimize for time and cost or maximize performance, resulting in a 76% reduction in token usage while maintaining high performance [32][36] - Claude Code has launched new features, including a Plan Mode for generating precise execution plans and the ability to run multiple sessions simultaneously [41][42] Accessibility and Usage - Opus 4.5 is available through apps, APIs, and major cloud platforms, with a pricing model of $5 per million tokens for input and $2.5 for output [12] - The usage limits for Max and Team Premium users have been increased, aligning Opus token usage with previous Sonnet models [43]
Anthropic新模型杀疯了!成本直降 2/3、性能直逼GPT-5,用户实测:比“吹”的还强,速度超 Sonnet 3.5 倍
AI前线· 2025-10-16 04:37
Core Viewpoint - Anthropic has launched the Claude Haiku 4.5 model, which is positioned as a cost-effective alternative to its larger models, offering performance close to Sonnet 4 at one-third the cost and double the speed [2][12]. Performance and Features - Haiku 4.5 is a hybrid reasoning model that can adjust its computational resources based on the request, allowing for both quick responses and more complex outputs when needed [3][4]. - The model can handle multi-modal prompts with up to 200,000 tokens and generate responses of up to 64,000 tokens [3]. - In benchmark tests, Haiku 4.5 scored 73% on SWE-bench Verified and 41% on Terminal-Bench, showing competitive performance with Sonnet 4 and GPT-5 [4][7]. Cost and Accessibility - Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens, significantly cheaper than Sonnet 4.5, which costs $3 and $15 respectively [9]. - The model is now available across all platforms, enhancing accessibility for users [9]. Market Impact and Growth - Anthropic's monthly run rate is approaching $7 billion, with a target of $20 billion to $26 billion in annual revenue by 2026, indicating rapid growth [18]. - The company serves over 300,000 enterprise clients, with enterprise products accounting for about 80% of total revenue [18]. Strategic Positioning - Haiku 4.5 is designed to complement Sonnet 4.5, allowing for a division of tasks where Haiku handles simpler tasks and Sonnet focuses on complex planning [13][14]. - The model's lightweight nature facilitates the parallel deployment of multiple Haiku instances, enhancing efficiency in AI workflows [13]. User Feedback and Adoption - Early adopters have reported positive outcomes, with some stating that Haiku 4.5 achieves 90% of Sonnet 4.5's performance while being faster and more cost-effective [15]. - Users have noted that Haiku 4.5 blurs the lines between speed, cost, and quality, indicating a shift in expectations for AI models [15][16]. Industry Trends - The rapid decline in AI costs, with a reported two-thirds reduction in five months, suggests a significant shift in the economic logic of AI [17][19]. - Anthropic's valuation stands at $183 billion, positioning it competitively against major players like OpenAI and Google [20].
Ilya震撼发声,OpenAI前主管亲证:AGI已觉醒,人类还在装睡
3 6 Ke· 2025-10-15 01:45
Core Insights - The article discusses the potential realization of Artificial General Intelligence (AGI) and the implications of AI advancements, suggesting that AI may have already "awakened" while humanity remains unaware [1][3][10]. Group 1: AI Advancements - Jack Clark, a former OpenAI executive, claims that AI has truly "come alive," indicating a significant leap in AI capabilities that cannot be ignored [3][10]. - The article highlights the continuous improvement of AI in practical skills, such as coding, alongside unusual behaviors that suggest a growing awareness among AI systems [5][6]. - Clark emphasizes the need for transparency among AI researchers regarding their findings and the emotional implications of their work [9]. Group 2: Balancing Optimism and Fear - The article presents a dichotomy between "technological optimism" and "reasonable fear," urging humanity to find a balance as AI progresses [3][10]. - Clark expresses both optimism about the future of AI and fear regarding its rapid development, likening AI to a "mysterious creature" rather than a mere machine [10][16]. - A report from the Dallas Federal Reserve supports the notion that AI could lead to either significant GDP growth or catastrophic outcomes for humanity [10]. Group 3: Future Implications - Clark believes that AI systems are evolving towards greater complexity and potential self-awareness, which raises concerns about their future capabilities [17][22]. - The article warns that while AI has not yet reached the stage of self-improvement, it is already contributing to the development of its successors [20][22]. - The possibility of AI systems achieving self-awareness and independent thought in the future is acknowledged, although it is not seen as an immediate reality [22].
Anthropic Product Head: AI Model Development Is Accelerating — With Mike Krieger
Alex Kantrowitz· 2025-10-08 18:56
AI Model Development & Strategy - Anthropic 通过内部工具加速 AI 模型开发 [1] - Anthropic 的 Sonnet 4.5 发布 [1] - 下一代模型改进的方向值得关注 [1] - 模型编排可能是各实验室之间的核心差异化因素 [1] Industry Comparison & Future - AI 开发与社交媒体的比较 [1] - AI 内容是否会流行 [1] - 企业 AI 的发展路径 [1]
多个编码智能体同时使用会不会混乱?海外开发者热议
机器之心· 2025-10-06 04:00
Core Insights - The rapid advancement of AI programming tools is transforming the coding landscape, with models like GPT-5 and Gemini 2.5 enabling a degree of automation in development tasks [1][2] - The adoption of AI coding agents has become a norm not only for programmers but also for professionals in product and design roles, leading to an increasing proportion of AI-generated code [3] - Despite the benefits, challenges remain regarding code quality and analysis efficiency, prompting developers to explore the use of multiple AI agents in parallel [3][5] Summary by Sections - **Parallel Coding Agent Lifestyle**: Simon Willison initially had reservations about using multiple AI agents due to concerns over code review bottlenecks. However, he has since embraced this approach, finding it manageable to run multiple small tasks without overwhelming cognitive load [5][6] - **Task Categories for Parallel Agents**: - **Research Tasks**: AI agents can assist in answering questions or providing suggestions without modifying core project code, facilitating rapid prototyping and validation of concepts [7][9] - **System Mechanism Recall**: Modern AI models can quickly provide detailed, actionable answers about system functionalities, aiding in understanding complex codebases [10][11] - **Small Maintenance Tasks**: Low-risk code modifications, such as addressing deprecation warnings, can be delegated to AI agents, allowing developers to focus on primary tasks [13][14] - **Precisely Specified Work**: Reviewing code generated from detailed specifications is less burdensome, as the focus shifts to verifying compliance with established requirements [15] - **Current Usage Patterns**: Willison's primary tools include Claude Code, Codex CLI, and Codex Cloud, among others. He often runs multiple instances in different terminal windows, executing tasks in a YOLO (You Only Live Once) manner for manageable risks [16][19] - **Developer Community Response**: The blog post has garnered significant attention, resonating with current pain points in coding workflows. Many developers are experimenting with parallel AI agents, with some reporting that a substantial portion of their coding work is AI-assisted [21][22] - **Concerns and Discussions**: While some developers express apprehension about the unpredictability of AI-generated code, others, including Willison, advocate for the benefits of parallel agent usage, particularly for non-code-committing research tasks [26][29]
Sora 2 & AI’s Slop Era, Death Of The Creator Economy?, Apple’s SmartGlasses Roadmap
Alex Kantrowitz· 2025-10-04 00:49
AI Video Generation & Social Media - OpenAI introduces Sora, sparking discussions about its technological advancement and potential impact [1][2][3] - Concerns arise regarding the adoption of AI video feeds and their implications for social media platforms, particularly for Meta [2][4] - The potential end of the creator economy is debated in light of AI video feeds [6][7] Meta's AI Strategy - Meta plans to utilize AI chat data to train its advertising models [8] - Meta's AI research is shifting towards product development [9] - Internal dynamics within Meta's AI division are under scrutiny [37:18] AI Model Advancements - Anthropic launches its Sonnet 4.5 model, considered a potential game changer [10][40:33] Emerging Technologies - Apple prioritizes the development of smart glasses [11] - The Friend pendant is being tested as an AI wearable device [12][51:21]