机器学习因子选股月报（2025年5月）-20250430

Quantitative Models and Construction Methods GAN_GRU Model - Model Name: GAN_GRU - Model Construction Idea: The GAN_GRU model utilizes Generative Adversarial Networks (GAN) for processing volume-price time series features and then uses the GRU model for time series feature encoding to derive the stock selection factor[2][9]. - Model Construction Process: 1. GRU Model: - Basic Assumptions: The GRU+MLP neural network stock return prediction model includes 18 volume-price features such as closing price, opening price, trading volume, turnover rate, etc[10][13][15]. - Training Data and Input Features: All stocks' past 400 days of 18 volume-price features, sampled every 5 trading days. The feature sampling shape is 40*18, using the past 40 days of volume-price features to predict the cumulative return of the next 20 trading days[14]. - Training and Validation Set Ratio: 80%:20%[14]. - Data Processing: Extreme value removal and standardization in the time series for each feature within the 40 days, and cross-sectional standardization at the stock level[14]. - Model Training Method: Semi-annual rolling training, i.e., training the model every six months and using it to predict the returns for the next six months. Training dates are June 30 and December 31 each year[14]. - Stock Selection Method: Select all stocks in the cross-section, excluding ST and stocks listed for less than six months[14]. - Training Sample Selection Method: Exclude samples with empty labels[14]. - Hyperparameters: batch_size is the number of stocks in the cross-section, optimizer Adam, learning rate 1e-4, loss function IC, early stopping rounds 10, maximum training rounds 50[14]. - Model Structure: Two GRU layers (GRU(128, 128)) followed by MLP layers (256, 64, 64). The final output predicted return pRet is used as the stock selection factor[18]. 2. GAN Model: - Introduction: GANs consist of a generator and a discriminator. The generator aims to generate realistic data, while the discriminator aims to distinguish between real and generated data[19]. - Generator: - Loss Function: $L_{G}\,=\,-\mathbb{E}_{z\sim P_{z}(z)}[\log(D(G(z)))]$ where $z$ represents random noise (usually Gaussian distributed), $G(z)$ represents the data generated by the generator, and $D(G(z))$ represents the probability that the discriminator judges the generated data as real[20][21]. - Training Process: Generate noise data, convert noise data to generated data using the generator, calculate generator loss, and update generator parameters through backpropagation[21][22]. - Discriminator: - Loss Function: $L_{D}=-\mathbb{E}_{x\sim P_{d a t a}(x)}[\log\!D(x)]-\mathbb{E}_{z\sim P_{z}(z)}[\log(1-D(G(z)))]$ where $x$ is real data, $D(x)$ is the probability that the discriminator judges the real data as real, and $D(G(z))$ is the probability that the discriminator judges the generated data as real[23]. - Training Process: Sample real data, generate fake data, calculate discriminator loss, and update discriminator parameters through backpropagation[24][25]. - GAN Training Process: Alternately train the generator and discriminator until convergence[25][26]. 3. GAN Feature Generation Model Construction: - LSTM Generator + CNN Discriminator: To retain the time series nature of the input features, the LSTM model is used as the generator. The CNN model is used as the discriminator to match the two-dimensional volume-price time series features[29][30][33]. - Feature Generation Process: Input original volume-price time series features (Input_Shape=(40,18)), output volume-price time series features processed by LSTM (Input_Shape=(40,18))[33]. Model Evaluation - Evaluation: The GAN_GRU model effectively combines GAN and GRU to process and encode volume-price time series features, providing a robust stock selection factor[2][9]. Model Backtest Results - GAN_GRU Model: - IC Mean: 11.73%[37][38] - Annualized Excess Return: 24.89%[37][38] - Latest IC: 0.22% (as of April 28, 2025)[37][38] - IC Mean in the Past Year: 11.44%[37][38] - Annualized Return: 36.06%[38] - Annualized Volatility: 23.80%[38] - Information Ratio (IR): 1.66[38] - Maximum Drawdown: 27.29%[38] - Turnover Rate: 0.83[38] - ICIR: 0.90[38] Quantitative Factors and Construction Methods GAN_GRU Factor - Factor Name: GAN_GRU - Factor Construction Idea: The GAN_GRU factor is derived from the GAN_GRU model, which processes volume-price time series features using GAN and encodes them using GRU[2][9]. - Factor Construction Process: The factor is generated by the GAN_GRU model, which includes the steps of feature processing by GAN and encoding by GRU as described in the model construction process[2][9][33]. - Factor Evaluation: The GAN_GRU factor shows strong performance in stock selection, with high IC values and significant excess returns[2][9]. Factor Backtest Results - GAN_GRU Factor: - IC Mean: 11.73%[37][38] - Annualized Excess Return: 24.89%[37][38] - Latest IC: 0.22% (as of April 28, 2025)[37][38] - IC Mean in the Past Year: 11.44%[37][38] - Annualized Return: 36.06%[38] - Annualized Volatility: 23.80%[38] - Information Ratio (IR): 1.66[38] - Maximum Drawdown: 27.29%[38] - Turnover Rate: 0.83[38] - ICIR: 0.90[38]