Workflow
时间序列模型
icon
Search documents
中金:一种结合自注意力机制的GRU模型
中金点睛· 2025-07-14 23:39
Core Viewpoint - The article discusses the evolution and optimization of time series models, particularly focusing on GRU and Transformer architectures, and introduces a new model called AttentionGRU(Res) that combines the strengths of both [1][6][49]. Group 1: Time Series Models Overview - Time series models, such as LSTM, GRU, and Transformer, are designed for analyzing and predicting sequential data, effectively addressing long-term dependencies through specialized gating mechanisms [1][8]. - GRU, as an optimized variant, enhances computational efficiency while maintaining long-term memory capabilities, making it suitable for real-time prediction scenarios [2][4]. - The Transformer model revolutionizes sequence modeling through self-attention mechanisms and position encoding, demonstrating significant advantages in analyzing multi-dimensional time series data [2][4]. Group 2: Performance Comparison of Factors - A systematic test of 159 cross-sectional factors and 158 time series factors revealed that while cross-sectional factors generally outperform time series factors, the latter showed better out-of-sample performance when used in RNN, LSTM, and GRU models [4][21]. - The average ICIR (Information Coefficient Information Ratio) for time series factors was found to be higher than that of cross-sectional factors, indicating better predictive performance despite a more dispersed distribution [4][20]. - In terms of returns, cross-sectional factors yielded a long-short excess return of 11%, compared to only 1% for time series factors, highlighting the differences in performance metrics [4][20]. Group 3: Model Optimization Strategies - The article explores various optimization strategies for time series models, including adjustments to the propagation direction of time series, optimization of gating structures, and overall structural combinations [5][27]. - Testing of BiGRU and GLU models showed limited improvement over the standard GRU model, while the Transformer model exhibited significant in-sample performance but suffered from overfitting in out-of-sample tests [5][28]. - The proposed AttentionGRU(Res) model combines a simplified self-attention mechanism with GRU, achieving a balance between performance and stability, resulting in an annualized excess return of over 30% in the full market [6][40][41]. Group 4: AttentionGRU(Res) Model Performance - The AttentionGRU(Res) model demonstrated strong performance, achieving a near 12.6% annualized excess return over the past five years in rolling samples, indicating its robustness in various market conditions [6][49]. - The model's generalization ability was validated within the CSI 1000 stock range, yielding an annualized excess return of 10.8%, outperforming traditional GRU and Transformer structures [6][46][49]. - The integration of residual connections and simplified self-attention structures in the AttentionGRU(Res) model significantly improved training stability and predictive performance [35][40].