Workflow
Token Efficiency
icon
Search documents
The Industry Reacts to Gemini 3...
Matthew Berman· 2025-11-20 02:14
Google dropped Gemini 3 24 hours ago and the industry has been reacting strongly. It is definitely the best model on the planet and I'm going to show you all of the industry reactions right now. First is from Artificial Analysis, the company that runs independent benchmarks against all of the top models. And yes, Gemini 3 is number one. Here's what they have to say. For the first time, Google has a leading language model and it debuts with a threepoint buffer between the second best model GPT 5.1%. And a lo ...
Claude is BACK! (30 Hours of Thinking!)
Matthew Berman· 2025-10-01 18:08
Model Performance & Benchmarks - Claude Sonnet 4.5% is considered the best coding model, demonstrating a significant advancement in coding ability [1] - On SWE-bench verified evaluation, Claude Sonnet 4.5% outperforms Opus 4.1% by a substantial margin, exceeding almost 20 percentage points compared to GPT-4 Code Interpreter and Gemini 1.5 Pro [1] - The model achieves top scores on Terminal Bench (50%), agentic tool use, and computer use benchmarks, excelling in high school math (Amy 2025 with Python) with a 100% score [1] Long Horizon Tasks & Efficiency - AI's ability to complete long horizon tasks is exponentially increasing, with the task duration AI can handle doubling every 7 months [1] - Claude Sonnet 4.5% can think independently for over 30 hours, indicating its suitability for agentic applications [1] - The industry is shifting towards measuring AI intelligence per watt, emphasizing the importance of task and token efficiency [2] Future Applications & Industry Impact - Anthropic is showcasing a vision of the future of software with "Claude Imagine," demonstrating the ability to generate applications on the fly within a desktop environment [1][2] - Claude is increasingly used to write its own code, with Anthropic's CEO stating that it writes the majority of the code for Claude [9][10] - Box tested Claude Sonnet 4.5% for data extraction accuracy with Box AI on 40,000 fields across 1500+ documents, and the model performed four percentage points better than Sonnet 4 [3][4] Pricing & Availability - Claude Sonnet 4.5% is priced at $3 per million input tokens and $15 per million output tokens, the same as Sonnet 4 [11] - Anthropic recommends immediate upgrading to Claude Sonnet 4.5% for all use cases [11]