Synergistic Core
Search documents
大模型长脑子了?研究发现LLM中层会自发模拟人脑进化
3 6 Ke· 2026-01-15 01:26
Core Insights - A recent study from researchers at Imperial College London and Huawei Noah's Ark Lab reveals that large language models (LLMs) spontaneously evolve a structure known as the Synergistic Core, akin to the human brain [1][2]. Model Architecture and Findings - The research team analyzed models such as Gemma, Llama, Qwen, and DeepSeek using the Partial Information Decomposition (PID) framework, discovering that mid-layers exhibit strong synergistic processing capabilities, while lower and upper layers tend to be more redundant [5][6][7]. - The study treats LLMs as distributed information processing systems, aiming to quantify the interactions between internal components [7]. Experimental Methodology - Researchers input cognitive task prompts across six categories, including grammar correction and logical reasoning, to generate responses, recording activation values from all attention heads or expert modules [8][9]. - The L2 norm of output vectors was calculated to measure activation strength, and the Integrated Information Decomposition (ID) framework was applied to analyze interactions between attention heads [10][11]. Synergistic Core Characteristics - The experimental data revealed a consistent spatial organization across different model architectures, with a notable "inverted U-shape" curve in the distribution of synergy [13]. - The redundant periphery, found in early and late layers, primarily processes information redundantly, while the synergistic core in mid-layers demonstrates high synergy, crucial for advanced semantic integration and abstract reasoning [15]. Architectural Consistency - The emergence of the Synergistic Core is not dependent on specific technical implementations, as similar spatial distribution features were observed in the DeepSeek V2 Lite model using expert modules [16][17]. Emergence of Intelligence - The study indicates that the structure of the Synergistic Core is a product of learning rather than an inherent feature of the Transformer architecture, as evidenced by the absence of this distribution in randomly initialized networks [19][21]. Validation of Synergistic Core Functionality - Two types of intervention experiments were conducted: ablation experiments showed that removing high-synergy nodes led to significant performance declines, confirming the Synergistic Core as a core driver of model intelligence [22]. - Fine-tuning experiments indicated that training focused on the Synergistic Core resulted in greater performance improvements compared to training on redundant cores or random subsets [23]. Implications for AI and Neuroscience - Identifying the Synergistic Core can aid in designing more efficient compression algorithms and targeted parameter updates to accelerate training in AI [27]. - This research provides computational validation for the role of synergistic loops in reinforcement learning and knowledge transfer, suggesting a convergence in organizational patterns between silicon-based models and biological brains [27].