Workflow
Context Window
icon
Search documents
New DeepSeek just did something crazy...
Matthew Berman· 2025-10-22 17:15
Deepseek OCR Key Features - Deepseek OCR is a novel approach to image recognition that compresses text by 10x while maintaining 97% accuracy [2] - The model uses a vision language model (VLM) to compress text into an image, allowing for 10 times more text in the same token budget [6][11] - The method achieves 96%+ OCR decoding precision at 9-10x text compression, 90% at 10-12x compression, and 60% at 20x compression [13] Technical Details - The model splits the input image into 16x16 patches [9] - It uses SAM, an 80 million parameter model, to look for local details [10] - It uses CLIP, a 300 million parameter model, to store information about how to put the images together [10] - The output is decoded by Deepseek 3B, a 3 billion parameter mixture of experts model with 570 million active parameters [10] Training Data - The model was trained on 30 million pages of diverse PDF data covering approximately 100 languages from the internet [21] - Chinese and English account for approximately 25 million pages, and other languages account for 5 million pages [21] Potential Impact - This technology could potentially 10x the context window of large language models [20] - Andre Carpathy suggests that pixels might be better inputs to LLMs than text tokens [17] - An entire encyclopedia could be compressed into a single high-resolution image [20]
12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer
AI Engineer· 2025-07-03 20:50
Core Principles of Agent Building - The industry emphasizes rethinking agent development from first principles, applying established software engineering practices to build reliable agents [11] - The industry highlights the importance of owning the control flow in agent design, allowing for flexibility in managing execution and business states [24][25] - The industry suggests that agents should be stateless, with state management handled externally to provide greater flexibility and control [47][49] Key Factors for Reliable Agents - The industry recognizes the ability of LLMs to convert natural language into JSON as a fundamental capability for building effective agents [13] - The industry suggests that direct tool use by agents can be harmful, advocating for a more structured approach using JSON and deterministic code [14][16] - The industry emphasizes the need to own and optimize prompts and context windows to ensure the quality and reliability of agent outputs [30][33] Practical Applications and Considerations - The industry promotes the use of small, focused "micro agents" within deterministic workflows to improve manageability and reliability [40] - The industry encourages integrating agents with various communication channels (email, Slack, Discord, SMS) to meet users where they are [39] - The industry advises focusing on the "hard AI parts" of agent development, such as prompt engineering and flow optimization, rather than relying on frameworks to abstract away complexity [52]