机器学习因子选股月报(2025年12月)-20251128
Southwest Securities·2025-11-28 07:02

Quantitative Models and Construction Methods - Model Name: GAN_GRU Model Construction Idea: The GAN_GRU model combines Generative Adversarial Networks (GAN) for processing volume-price sequential features and Gated Recurrent Unit (GRU) for encoding sequential features to construct a stock selection factor [4][13] Model Construction Process: 1. GRU Model: - The GRU model is based on 18 volume-price features, including closing price, opening price, trading volume, turnover rate, etc. [14][17][19] - Training data includes the past 400 days of volume-price features for all stocks, with feature sampling every 5 trading days. The feature sampling shape is 40x18, using the past 40 days' features to predict the cumulative return over the next 20 trading days [18] - Data processing includes outlier removal and standardization for each feature in the time series and cross-sectional standardization at the stock level [18] - The model structure includes two GRU layers (GRU(128, 128)) followed by an MLP (256, 64, 64). The final output, predicted return (pRet), is used as the stock selection factor [22] - Training is conducted semi-annually, with training points on June 30 and December 31 each year. The training set and validation set are split in an 80:20 ratio [18] - Hyperparameters: batch_size equals the number of cross-sectional stocks, optimizer is Adam, learning rate is 1e-4, loss function is IC, early stopping rounds are 10, and maximum training rounds are 50 [18] 2. GAN Model: - The GAN model consists of a generator (G) and a discriminator (D). The generator learns the real data distribution and generates realistic samples, while the discriminator distinguishes between real and generated data [23][24] - Generator loss function: LG=EzPz(z)[log(D(G(z)))]L_{G} = -\mathbb{E}_{z\sim P_{z}(z)}[\log(D(G(z)))] where zz represents random noise, G(z)G(z) is the generated data, and D(G(z))D(G(z)) is the discriminator's output probability for the generated data [24][25] - Discriminator loss function: LD=ExPdata(x)[logD(x)]EzPz(z)[log(1D(G(z)))]L_{D} = -\mathbb{E}_{x\sim P_{data}(x)}[\log D(x)] - \mathbb{E}_{z\sim P_{z}(z)}[\log(1-D(G(z)))] where xx is real data, D(x)D(x) is the discriminator's output probability for real data, and D(G(z))D(G(z)) is the discriminator's output probability for generated data [27][29] - The generator uses an LSTM model to retain the sequential nature of input features, while the discriminator employs a CNN model to process the two-dimensional volume-price sequential features [33][37] Model Evaluation: The GAN_GRU model effectively captures volume-price sequential features and demonstrates strong predictive power for stock selection [4][13][22] Model Backtesting Results - GAN_GRU Model: - IC Mean: 0.1131*** - ICIR (non-annualized): 0.90 - Turnover Rate: 0.83 - Recent IC: 0.1241*** - One-Year IC Mean: 0.0867*** - Annualized Return: 37.52% - Annualized Volatility: 23.52% - IR: 1.59 - Maximum Drawdown: 27.29% - Annualized Excess Return: 23.14% [4][41][42] Quantitative Factors and Construction Methods - Factor Name: GAN_GRU Factor Factor Construction Idea: The GAN_GRU factor is derived from the GAN_GRU model, leveraging GAN for volume-price sequential feature processing and GRU for sequential feature encoding [4][13] Factor Construction Process: - The factor is constructed using the predicted return (pRet) output from the GAN_GRU model. The factor undergoes industry and market capitalization neutralization, as well as standardization [22] Factor Evaluation: The GAN_GRU factor demonstrates robust performance across various industries and time periods, with significant IC values and excess returns [4][13][41] Factor Backtesting Results - GAN_GRU Factor: - IC Mean: 0.1131*** - ICIR (non-annualized): 0.90 - Turnover Rate: 0.83 - Recent IC: 0.1241*** - One-Year IC Mean: 0.0867*** - Annualized Return: 37.52% - Annualized Volatility: 23.52% - IR: 1.59 - Maximum Drawdown: 27.29% - Annualized Excess Return: 23.14% [4][41][42] Industry-Specific Performance - Recent IC Rankings (Top 5 Industries): - Social Services: 0.2198*** - Real Estate: 0.2027*** - Steel: 0.1774*** - Non-Bank Financials: 0.1754*** - Coal: 0.1537*** [4][41][42] - One-Year IC Mean Rankings (Top 5 Industries): - Non-Bank Financials: 0.1401*** - Steel: 0.1367*** - Retail: 0.1152*** - Textiles & Apparel: 0.1124*** - Utilities: 0.1092*** [4][41][42] - Recent Excess Return Rankings (Top 5 Industries): - Environmental Protection: 7.24% - Machinery: 4.37% - Real Estate: 4.03% - Textiles & Apparel: 3.89% - Building Materials: 2.91% [4][45][46] - One-Year Average Excess Return Rankings (Top 5 Industries): - Building Materials: 2.15% - Real Estate: 1.97% - Social Services: 1.77% - Textiles & Apparel: 1.71% - Retail: 1.62% [4][45][46]