Core Viewpoint - A joint team from the University of Edinburgh and NVIDIA has developed a new method to compress the memory required for AI models, enhancing their accuracy in complex tasks while maintaining response speed and significantly reducing energy consumption [1][4]. Group 1: Memory Compression Technique - The team discovered that compressing the memory used by large language models (LLMs) to one-eighth of its original size improved performance in specialized tests, such as mathematics, science, and programming, without extending inference time [4]. - This memory compression technique, named "Dynamic Memory Sparsification" (DMS), allows AI models to dynamically determine which tokens are essential for subsequent reasoning and which can be discarded, enabling deeper "thinking" within the same computational resources [4][6]. Group 2: Performance Improvements - In tests based on the American Mathematics Olympiad qualification exam (AIME 24), the compressed model scored an average of 12 points higher than the uncompressed model under the same memory read conditions [5]. - The compressed model also outperformed the original model in a professional science question bank created by PhD-level experts and improved its average score by 10 points on a platform assessing coding abilities [5]. Group 3: Implications for AI Development - The DMS memory compression technique challenges the conventional belief that more computational resources lead to stronger AI, suggesting a paradigm shift towards lightweight, high-performance AI [6]. - This approach aligns with human cognitive processes, emphasizing selective memory and key information extraction, which may accelerate the development of general AI [6].
总编辑圈点 | 更小内存带来更强AI,压缩内存可提升大模型处理任务准确性
Huan Qiu Wang Zi Xun·2026-01-01 04:29