Workflow
进化算法
icon
Search documents
将KV Cache预算降至1.5%!他们用进化算法把大模型内存占用砍下来了
机器之心· 2025-09-14 05:16
Core Insights - EvolKV achieves superior performance with only 1.5% of the full KV cache budget, significantly reducing inference costs for large language models [1][11][25] - The traditional KV cache methods face challenges with long input texts, leading to increased storage requirements and slower processing [3][4] KV Cache Optimization - Existing KV cache compression methods primarily rely on heuristic approaches, which may not optimally retain task-relevant information [4][9] - EvolKV introduces an evolutionary framework that adaptively allocates KV cache budgets across transformer layers, optimizing for downstream task performance [6][10] Performance Improvements - In various benchmark tests, EvolKV consistently outperforms baseline methods, achieving up to a 13% improvement in the Needle-in-a-Haystack benchmark and maintaining high accuracy in the GSM8K dataset [11][30][25] - The method demonstrates strong adaptability across diverse tasks, maintaining competitive performance even with reduced cache budgets [25][29] Experimental Results - Comprehensive experiments on Mistral 7B-Instruct and Llama-3-8B-Instruct show that EvolKV outperforms all baseline methods across multiple KV cache budget configurations [22][24] - In the LongBench evaluation, EvolKV consistently achieved the highest average performance, even surpassing the full model in certain configurations [22][25] Evolutionary Algorithm Mechanism - The evolutionary algorithm generates candidate solutions and evaluates their fitness based on downstream task performance, guiding the optimization process [13][14] - The optimization process is structured in groups to enhance efficiency, allowing for a more stable optimization dynamic [16][17] Cache Budget Allocation - EvolKV employs a dynamic, task-driven approach to allocate KV cache budgets, ensuring that the distribution aligns with the functional contributions of different transformer layers [10][19] - The method includes a mechanism for adjusting the total KV cache budget to ensure fairness in evaluation [20]
打破56年数学铁律!谷歌AlphaEvolve自我进化实现算法效率狂飙,堪比AlphaGo“神之一手”
量子位· 2025-05-18 02:01
一水 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 数学能力几乎和AlphaGo的围棋水平一样?! 这是研究员对AlphaEvolve的最新评价,就在不久之前,谷歌DeepMind联合陶哲轩等一众顶尖科学家打造了 「通用科学人工智能」 AlphaEvolve ,直接打破了矩阵乘法领域56年以来的效率基准。 一位谷歌前员工更是将这一成就类比为传说中的"神之一手": 太疯狂了!AlphaEvolve的数学能力相当于AlphaGo打败人类的"神之一手"第37步。 具体而言, 4x4矩阵乘法 的 49 次标量乘法效率基准已经持续56年,而AlphaEvolve直接将这个数字改写为 48 。 别看数字只前进了一小步,但背后所代表的 更快的矩阵乘法算法 可谓意义重大。 不仅可以解决复杂数学难题,还能用来改进芯片设计、提高数据中心和AI训练的效率。 在谷歌内部使用中,它将Gemini架构中大型矩阵乘法运算加速了23%,从而将Gemini的训练时间缩短了1%,并且还将FlashAttention提速了 32.5%。 那么接下来的问题是—— AlphaEvolve是如何做到的?背后藏着哪些核心技术原理? 在AlphaE ...