YaRN
Search documents
LSTM之父率队造出PoPE:终结RoPE泛化难题,实现Transformer的极坐标进化
机器之心· 2026-01-02 01:55
Core Viewpoint - The article discusses a new approach called Polar Coordinate Position Embedding (PoPE) that addresses the limitations of the existing Rotational Position Embedding (RoPE) method in Transformer architectures, particularly in decoupling content and positional information for improved model performance [1][2]. Group 1: RoPE Issues - RoPE entangles content and position information, which can degrade model performance, especially in tasks requiring independent matching of these factors [1][4]. - In various advanced models, RoPE is the preferred method for incorporating positional information, but it struggles with tasks that require clear separation of content and position [5][19]. Group 2: PoPE Solution - PoPE eliminates the confusion between content and position, leading to significantly better performance in diagnostic tasks that require indexing based solely on either content or position [2][10]. - The attention score in PoPE is defined using a different approach that allows for the decoupling of content and position, enhancing model learning efficiency [12][13]. Group 3: Performance Comparison - In indirect indexing tasks, PoPE achieved an average accuracy of 94.82%, while RoPE only reached 11.16%, demonstrating PoPE's superior ability to separate content and positional information [18][19]. - In music and genomic sequence modeling, PoPE outperformed RoPE with lower negative log likelihood (NLL) values across various datasets [20][22]. - In language modeling on the OpenWebText dataset, PoPE consistently showed lower perplexity across all model sizes compared to RoPE [25][26]. Group 4: Generalization and Stability - PoPE exhibits strong extrapolation capabilities without requiring fine-tuning or interpolation, maintaining stability in performance even as model size increases, unlike RoPE [31][32].