梁文锋署名DeepSeek新论文发布,直指大模型“记忆”短板

Core Insights - The paper published by DeepSeek addresses the memory limitations of current large language models and introduces the concept of "conditional memory" [2] - DeepSeek proposes a module named Engram, which breaks down language modeling tasks into two branches: "static pattern retrieval" for quick access to deterministic knowledge and "dynamic combinatorial reasoning" for complex logical operations [2] - The paper suggests that conditional memory is an essential modeling primitive for the next generation of sparse models, with speculation that DeepSeek's next model may be released before the Spring Festival [3] Group 1 - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by Peking University and DeepSeek [1] - The introduction of "conditional memory" aims to enhance the memory capabilities of large language models [2] - The Engram module is designed to improve efficiency in language modeling by separating tasks into static and dynamic components [2] Group 2 - The paper emphasizes the importance of conditional memory for future sparse model development [3] - There are speculations regarding the release of DeepSeek's next-generation model around the Spring Festival, potentially replicating the success of previous launches [3]

Seek .-梁文锋署名DeepSeek新论文发布,直指大模型“记忆”短板 - Reportify