EvolKV
Search documents
将KV Cache预算降至1.5%!他们用进化算法把大模型内存占用砍下来了
机器之心· 2025-09-14 05:16
Core Insights - EvolKV achieves superior performance with only 1.5% of the full KV cache budget, significantly reducing inference costs for large language models [1][11][25] - The traditional KV cache methods face challenges with long input texts, leading to increased storage requirements and slower processing [3][4] KV Cache Optimization - Existing KV cache compression methods primarily rely on heuristic approaches, which may not optimally retain task-relevant information [4][9] - EvolKV introduces an evolutionary framework that adaptively allocates KV cache budgets across transformer layers, optimizing for downstream task performance [6][10] Performance Improvements - In various benchmark tests, EvolKV consistently outperforms baseline methods, achieving up to a 13% improvement in the Needle-in-a-Haystack benchmark and maintaining high accuracy in the GSM8K dataset [11][30][25] - The method demonstrates strong adaptability across diverse tasks, maintaining competitive performance even with reduced cache budgets [25][29] Experimental Results - Comprehensive experiments on Mistral 7B-Instruct and Llama-3-8B-Instruct show that EvolKV outperforms all baseline methods across multiple KV cache budget configurations [22][24] - In the LongBench evaluation, EvolKV consistently achieved the highest average performance, even surpassing the full model in certain configurations [22][25] Evolutionary Algorithm Mechanism - The evolutionary algorithm generates candidate solutions and evaluates their fitness based on downstream task performance, guiding the optimization process [13][14] - The optimization process is structured in groups to enhance efficiency, allowing for a more stable optimization dynamic [16][17] Cache Budget Allocation - EvolKV employs a dynamic, task-driven approach to allocate KV cache budgets, ensuring that the distribution aligns with the functional contributions of different transformer layers [10][19] - The method includes a mechanism for adjusting the total KV cache budget to ensure fairness in evaluation [20]