Workflow
Claude Sonnet 4.5%
icon
Search documents
Claude is BACK! (30 Hours of Thinking!)
Matthew Berman· 2025-10-01 18:08
Model Performance & Benchmarks - Claude Sonnet 4.5% is considered the best coding model, demonstrating a significant advancement in coding ability [1] - On SWE-bench verified evaluation, Claude Sonnet 4.5% outperforms Opus 4.1% by a substantial margin, exceeding almost 20 percentage points compared to GPT-4 Code Interpreter and Gemini 1.5 Pro [1] - The model achieves top scores on Terminal Bench (50%), agentic tool use, and computer use benchmarks, excelling in high school math (Amy 2025 with Python) with a 100% score [1] Long Horizon Tasks & Efficiency - AI's ability to complete long horizon tasks is exponentially increasing, with the task duration AI can handle doubling every 7 months [1] - Claude Sonnet 4.5% can think independently for over 30 hours, indicating its suitability for agentic applications [1] - The industry is shifting towards measuring AI intelligence per watt, emphasizing the importance of task and token efficiency [2] Future Applications & Industry Impact - Anthropic is showcasing a vision of the future of software with "Claude Imagine," demonstrating the ability to generate applications on the fly within a desktop environment [1][2] - Claude is increasingly used to write its own code, with Anthropic's CEO stating that it writes the majority of the code for Claude [9][10] - Box tested Claude Sonnet 4.5% for data extraction accuracy with Box AI on 40,000 fields across 1500+ documents, and the model performed four percentage points better than Sonnet 4 [3][4] Pricing & Availability - Claude Sonnet 4.5% is priced at $3 per million input tokens and $15 per million output tokens, the same as Sonnet 4 [11] - Anthropic recommends immediate upgrading to Claude Sonnet 4.5% for all use cases [11]
Anthropic Targets Enterprise Growth with New AI Model
Bloomberg Technology· 2025-09-30 18:54
Model Performance & Capabilities - Claude Sonnet 4.5% excels in memory and context management, enabling it to maintain coherence over extended periods [1][2] - The model prioritizes accuracy and good code production as prerequisites before scaling up the time horizon [3] - Claude Sonnet 4.5% demonstrates the lowest hallucination rate and is least susceptible to jailbreaks, enhancing its reliability [3] - The model can create professional-looking Word, Excel, and PowerPoint documents, driving enterprise adoption [5] Target Audience & Applications - The initial audience focus for Claude Sonnet 4.5% is enterprise customers, with applications extending into the consumer space through power users and developers [4][5] - The model aims to automate work in the browser, focusing on productivity rather than entertainment [6] - The company emphasizes ensuring AI integration in the workplace is accompanied by the right tools and enablement to avoid disillusionment and maximize productivity gains [7][8] Infrastructure & Deployment - Training and inference for Claude models are conducted through partnerships with Google and Amazon, with significant serving from Amazon and growth on native Bedrock [12] - The company is scaling up for both training and inference, securing compute deals to support revenue-generating inference [13][14] - International deployment of chips is crucial for addressing data locality concerns in regions like Europe, ensuring inferences happen at local data centers [15] Talent & Development - The company's mission-oriented culture has minimized the impact of talent movement among frontier labs [17] - Roles needed for rolling out Sonnet 4.5% include research and model sciences, requiring both technical expertise and artistic taste in decision-making [18] Market Adoption - Claude Sonnet 4.5% experienced rapid adoption, with usage surpassing all other models combined shortly after its release [11] - On day one, platforms like GitHub sought to incorporate Claude Sonnet 4.5% [12]
CoreWeave’s $14 Billion Meta Deal, Spotify’s Ek to Leave CEO Role | Bloomberg Tech 9/30/2025
Bloomberg Technology· 2025-09-30 18:19
AI Infrastructure and Investment - Coreweave is providing Meta with computing power in a deal worth $14.2 billion [1][2][4] - AI infrastructure financing is increasingly reliant on debt [6] - Investors are seeking opportunities in overlooked groups like suppliers of chip-making gear [6][7][8] - Cerebras has closed a $1.1 billion funding round, valuing the company at $8.1 billion post-money, to rival NVIDIA in AI chip making [47] AI Model Development and Application - Anthropic released Claude Sonnet 4.5%, an AI model designed to code for up to 30 hours straight [1][13][14] - Claude Sonnet 4.5% focuses on enterprise customers and productivity, automating tasks in browsers and creating professional documents [18][19][20] - DeepSeek updated its experimental AI model, introducing new techniques to improve efficiency in processing long text sequences [13] Delivery and Autonomous Systems - DoorDash unveiled Dot, an autonomous delivery robot designed to navigate bike lanes, roads, and sidewalks, with a cargo capacity of up to 30 pounds [34][37] - DoorDash aims to reach approximately 1.5 million customers with Dot by the end of the year, focusing on the Greater Phoenix area [35] - DoorDash announced an Autonomous Delivery Platform (ADP) to integrate various delivery modalities, including Dashers, drones, and robots [39][41] Market and Economic Impact - Data centers' massive power demands are driving up electricity costs, affecting consumers, particularly in areas like Baltimore [59][60][61] - Areas closer to data center activity are more likely to experience wholesale power price increases [63] Leadership Changes - Spotify's CEO is stepping aside after almost two decades, with leadership transitioning to the Chief Product and Technology Officer and the Chief Business Officer, starting January 1 [69][70] - Spotify's stock is down more than 5% following the announcement of the CEO's departure [70]