用「传心术」替代「对话」，清华大学联合无问芯穹、港中文等机构提出Cache-to-Cache模型通信新范式

Core Insights - The article discusses the rapid advancements in large language models (LLMs) and the introduction of a new communication paradigm called Cache to Cache (C2C), which enhances multi-agent systems by allowing direct communication through KV-Cache instead of traditional Text to Text (T2T) methods [2][5][10]. Limitations of Existing Text Communication - T2T communication faces significant limitations, including information loss due to dimensionality reduction, semantic ambiguity inherent in natural language, and substantial delays caused by token-by-token output generation [7][8][6]. Advantages of KV-Cache - KV-Cache inherently contains multi-dimensional semantic information from the dialogue process, improving accuracy and efficiency. Experiments show that optimized KV-Cache can significantly enhance model accuracy and facilitate effective communication between different models [11][12][29]. C2C Mechanism - The C2C framework utilizes a fusion mechanism that integrates KV-Cache from different models, ensuring compatibility and effective information transfer. This involves a residual fusion structure to maintain the original semantics of the receiver model [16][17][19]. Performance and Efficiency - C2C demonstrates substantial performance improvements over T2T, with accuracy increases of 3% to 5% and speed enhancements of up to two times. The framework allows for efficient parallel processing, avoiding the inefficiencies of one-dimensional text output [29][31][28]. Experimental Results - The article presents various experimental results showing that C2C consistently outperforms T2T across multiple benchmarks, with significant accuracy gains and reduced inference times [28][31][29]. Future Prospects - The C2C paradigm has broad applications, including enhancing collaboration in multi-agent systems, integrating multimodal models, and improving privacy-aware cloud-edge collaboration. It is positioned as a key enabling technology for the next generation of multi-agent systems [36][38][39].