梁文锋署名DeepSeek新论文,“突破GPU内存限制”

Core Insights - DeepSeek, a Chinese AI startup, has published a technical paper introducing a new model training technique that bypasses GPU memory limitations, highlighting its focus on cost efficiency despite existing gaps with leading US firms [1][2] - The new technique, termed "Engram," addresses the bottleneck of limited high-bandwidth memory (HBM) in scaling AI models, which is a significant gap between China and the US in AI hardware [3][4] - The paper has garnered attention from industry professionals in both China and the US, indicating DeepSeek's role as a leader in AI innovation over the past year [1][2] Technical Developments - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" presents the "conditional memory" technology aimed at improving the efficiency of AI models when processing long contexts, a major challenge for AI chatbots [2][3] - The Engram technique allows for the decoupling of computation and storage, enhancing the model's ability to retrieve foundational information more efficiently [3][4] - Validation of this technology was conducted on a model with 27 billion parameters, showing performance improvements in key industry benchmarks [3] Market Position and Competition - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than competitors like OpenAI, while achieving comparable performance [6][7] - Microsoft President Brad Smith has noted that US AI companies are being surpassed by Chinese competitors like DeepSeek, particularly in emerging markets due to the low-cost and user-friendly nature of Chinese open-source models [7] - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [8]

Seek .-梁文锋署名DeepSeek新论文,“突破GPU内存限制” - Reportify