Token Efficiency
Search documents
杨植麟揭秘Kimi预训练策略:提升Token efficiency,实现长文本
Xin Lang Cai Jing· 2026-01-10 12:09
Core Insights - The core focus of the article is on the strategies for pre-training AI models, specifically emphasizing Token Efficiency and Long Context as critical components for enhancing performance in complex tasks [2][6]. Group 1: Token Efficiency - Token Efficiency is crucial because the reasoning or training of agents is fundamentally a search process, where better pre-training reduces the search space and enhances prior knowledge [3][7]. - The importance of Token Efficiency is highlighted by the need for AI to develop complex systems, such as an operating system, without enumerating every possible token combination, which may be meaningless or incorrect [7]. Group 2: Long Context - The architecture of Transformers shows significant advantages in long context scenarios, with experiments indicating that performance drops below LSTM when context length exceeds 1000 tokens, underscoring the importance of context length in model design [2][6]. - In the current Agentic era, many tasks require long contexts to execute complex instructions, making architectures with lower positional loss more technically capable [2][6]. Group 3: Aesthetic Considerations in AI - The development of AI models is not just a technical challenge but also involves aesthetic considerations, where the creation of a model reflects a worldview and values, akin to the concept of "Taste" as articulated by influential figures like Steve Jobs [3][7]. - Each model generates unique tokens that are not interchangeable, indicating that intelligence produced by different roles (e.g., a CEO vs. a designer) varies significantly, leading to an exponential increase in the space of possible "Tastes" [4][8].
The Industry Reacts to Gemini 3...
Matthew Berman· 2025-11-20 02:14
Google dropped Gemini 3 24 hours ago and the industry has been reacting strongly. It is definitely the best model on the planet and I'm going to show you all of the industry reactions right now. First is from Artificial Analysis, the company that runs independent benchmarks against all of the top models. And yes, Gemini 3 is number one. Here's what they have to say. For the first time, Google has a leading language model and it debuts with a threepoint buffer between the second best model GPT 5.1%. And a lo ...
Claude is BACK! (30 Hours of Thinking!)
Matthew Berman· 2025-10-01 18:08
Model Performance & Benchmarks - Claude Sonnet 4.5% is considered the best coding model, demonstrating a significant advancement in coding ability [1] - On SWE-bench verified evaluation, Claude Sonnet 4.5% outperforms Opus 4.1% by a substantial margin, exceeding almost 20 percentage points compared to GPT-4 Code Interpreter and Gemini 1.5 Pro [1] - The model achieves top scores on Terminal Bench (50%), agentic tool use, and computer use benchmarks, excelling in high school math (Amy 2025 with Python) with a 100% score [1] Long Horizon Tasks & Efficiency - AI's ability to complete long horizon tasks is exponentially increasing, with the task duration AI can handle doubling every 7 months [1] - Claude Sonnet 4.5% can think independently for over 30 hours, indicating its suitability for agentic applications [1] - The industry is shifting towards measuring AI intelligence per watt, emphasizing the importance of task and token efficiency [2] Future Applications & Industry Impact - Anthropic is showcasing a vision of the future of software with "Claude Imagine," demonstrating the ability to generate applications on the fly within a desktop environment [1][2] - Claude is increasingly used to write its own code, with Anthropic's CEO stating that it writes the majority of the code for Claude [9][10] - Box tested Claude Sonnet 4.5% for data extraction accuracy with Box AI on 40,000 fields across 1500+ documents, and the model performed four percentage points better than Sonnet 4 [3][4] Pricing & Availability - Claude Sonnet 4.5% is priced at $3 per million input tokens and $15 per million output tokens, the same as Sonnet 4 [11] - Anthropic recommends immediate upgrading to Claude Sonnet 4.5% for all use cases [11]