Group 1 - The core idea of the article revolves around the concept of optical compression of context using visual tokens to address the computational challenges faced by large language models as context window sizes increase [2][3] - DeepSeek's research demonstrates that visual compression can maintain high accuracy, achieving a compression rate of 10 times while retaining 96.5% precision [3][4] - The DeepEncoder module is identified as the key engine for achieving optical compression, utilizing components such as the SAM module, convolutional blocks, and CLIP to effectively compress data from 1000 text tokens to 100 visual tokens [5][7] Group 2 - Optical computing is highlighted as a more suitable solution for context compression due to its ability to handle the information aggregation processes inherent in ViT and CNN structures more efficiently than traditional electronic chips [7][9] - The advantages of optical computing include simplified computation processes and scalability, allowing for enhanced parallelism and dynamic programmability, which are crucial for long text reasoning tasks [9][11] - Future plans involve exploring algorithms based on human memory mechanisms and developing specialized hardware for context compression and AI tasks, aiming to connect optical computing with large models [13][15] Group 3 - The article emphasizes the need for optical computing to overcome the limitations of traditional GPUs, particularly in terms of memory constraints and power density, as large models become more prevalent [15] - The company aims to build a next-generation disruptive platform system for large-scale AI computing, providing comprehensive optical computing solutions across various scenarios [15]
DeepSeek-OCR实现光学压缩 光计算可为大模型“减负”