高阶注意力机制
Search documents
华为新架构砍了Transformer大动脉!任意模型推理能力原地飙升
量子位· 2025-12-05 02:13
Core Viewpoint - The article discusses the limitations of the traditional Transformer architecture, particularly its Attention mechanism, and introduces a new architecture called Nexus, which employs a Higher-Order Attention Mechanism to enhance reasoning capabilities in complex tasks [1][2][4][7]. Group 1: Limitations of Traditional Transformer - The traditional Attention mechanism struggles with complex mathematical problems and multi-step logical reasoning, leading to inaccurate outputs [2][6]. - The core issue lies in the static nature of Query (Q) and Key (K) generation, which limits the model's ability to capture complex relationships [15][14]. Group 2: Introduction of Nexus - Huawei's Noah's Ark Lab has developed Nexus, which addresses the limitations of the traditional Attention mechanism by using higher-order attention to model complex relationships effectively [7][8]. - Experimental results indicate that models using Nexus show significant improvements in reasoning tasks without increasing parameters [10][35]. Group 3: Innovations in Nexus Architecture - Nexus innovates by making the generation of Q and K an attention operation itself, allowing tokens to aggregate contextual information before calculating Q and K [17][18]. - The architecture employs a recursive framework that supports multi-hop reasoning, enabling the construction of higher-order relationships [23][27]. - Nexus maintains parameter efficiency through weight-sharing strategies, ensuring that the model's complexity does not lead to an increase in parameter count [29][31]. Group 4: Performance Improvements - In experiments with the Pythia series models, Nexus consistently outperformed the original Transformer across various reasoning datasets, with notable improvements in tasks requiring multi-step reasoning [36][39]. - For instance, the accuracy of the 70M model on the SciQ dataset improved from 61.5% to 68.5%, a 7 percentage point increase [39]. Group 5: Application and Future Directions - Nexus demonstrates plug-and-play capabilities, allowing for easy integration into larger models without extensive retraining, thus enhancing reasoning abilities [41][44]. - The team plans to explore Nexus's applications in visual Transformers and multimodal models, indicating its potential beyond language tasks [45][46].