Meta新注意力机制突破Transformer上限，还用上了OpenAI的开源技术

Core Viewpoint - Meta has made significant advancements by leveraging OpenAI's technology and recruiting a large number of OpenAI employees, resulting in the development of a new architecture called 2-Simplicial Transformer, which enhances the efficiency of data utilization in training large models [1][2][26]. Group 1: New Architecture and Methodology - The 2-Simplicial Transformer modifies standard attention mechanisms to improve the efficiency of data usage, addressing the data bottleneck in current large model development [2][4]. - The core method involves extending the standard dot-product attention to a trilinear function, allowing for better expression of complex tasks [3][6]. - A new key vector, K', is introduced to enhance the model's ability to capture richer relationships during attention calculations [9][10]. Group 2: Performance and Scalability - Experimental results indicate that the 2-Simplicial Transformer outperforms traditional Transformers in mathematical, programming, and reasoning tasks, especially as model parameters increase [4][19]. - The scaling index of the new architecture is superior to that of traditional Transformers, suggesting that performance improves more rapidly with increased parameters and data, making it advantageous in data-limited scenarios [20][22]. - In various tasks, the 2-Simplicial Transformer shows improved performance metrics compared to traditional Transformers, particularly in larger models [18][21]. Group 3: Implementation and Challenges - The implementation of the 2-Simplicial Transformer utilizes Triton, a GPU programming framework that allows for efficient computation without requiring extensive CUDA experience [11][12]. - Despite its advantages, the computational complexity and latency of the 2-Simplicial Transformer remain high, indicating a need for further optimization for production environments [22].