Workflow
机器学习因子选股月报(2025年6月)-20250529
Southwest Securities·2025-05-29 05:15

Quantitative Models and Construction Methods 1. Model Name: GAN_GRU - Model Construction Idea: The GAN_GRU model combines Generative Adversarial Networks (GAN) for processing volume-price time-series features and Gated Recurrent Unit (GRU) for encoding time-series features to create a stock selection factor[2][9] - Model Construction Process: 1. GAN Component: - Generator (G): Generates realistic data from random noise (e.g., Gaussian distribution). The generator's loss function is: LG=EzPz(z)[log(D(G(z)))]L_{G} = -\mathbb{E}_{z\sim P_{z}(z)}[\log(D(G(z)))] where z z represents random noise, G(z) G(z) is the generated data, and D(G(z)) D(G(z)) is the discriminator's output probability that the generated data is real[20][21] - Discriminator (D): Distinguishes real data from generated data. The discriminator's loss function is: LD=ExPdata(x)[logD(x)]EzPz(z)[log(1D(G(z)))]L_{D} = -\mathbb{E}_{x\sim P_{data}(x)}[\log D(x)] - \mathbb{E}_{z\sim P_{z}(z)}[\log(1-D(G(z)))] where x x is real data, D(x) D(x) is the discriminator's output probability for real data, and D(G(z)) D(G(z)) is the output probability for generated data[23][25] - Training alternates between updating the generator and discriminator to improve feature generation and discrimination capabilities[26] 2. GRU Component: - Two GRU layers (GRU(128, 128)) are used to encode time-series features, followed by a Multi-Layer Perceptron (MLP) with layers (256, 64, 64) to output predicted returns (pRet pRet )[18] 3. Feature Input and Processing: - Input features include 18 volume-price characteristics (e.g., closing price, turnover rate) sampled over the past 40 days to predict cumulative returns for the next 20 days[10][14] - Data preprocessing includes outlier removal, standardization, and cross-sectional normalization[14] - Training is conducted semi-annually with rolling updates[14] 4. GAN_GRU Integration: - The GAN generator processes raw volume-price time-series features (Input_Shape=(40,18)) and outputs features encoded by LSTM. These features are then passed to the GRU model for further processing[33][34] - Model Evaluation: The GAN_GRU model effectively captures time-series and cross-sectional features, demonstrating strong predictive power for stock selection[2][9] --- Model Backtesting Results 1. GAN_GRU Model - IC Mean: 11.57%[37][38] - ICIR: 0.89[38] - Turnover Rate: 0.83[38] - Recent IC: -0.28%[37][38] - 1-Year IC Mean: 11.54%[37][38] - Annualized Return: 36.60%[38] - Annualized Volatility: 24.02%[38] - IR: 1.66[38] - Maximum Drawdown: 27.29%[38] - Annualized Excess Return: 24.89%[38] --- Quantitative Factors and Construction Methods 1. Factor Name: GAN_GRU Factor - Factor Construction Idea: Derived from the GAN_GRU model, this factor leverages GAN for feature generation and GRU for time-series encoding to predict stock returns[2][9] - Factor Construction Process: 1. Input Features: 18 volume-price characteristics (e.g., closing price, turnover rate) sampled over the past 40 days[10][14] 2. GAN Feature Generation: - LSTM-based generator processes raw time-series features to retain temporal properties[29][33] - CNN-based discriminator identifies realistic features from generated ones[29] 3. GRU Encoding: Encodes GAN-generated features using two GRU layers and an MLP to output predicted returns[18][33] 4. Factor Normalization: Industry and market capitalization neutralization, followed by standardization[18] - Factor Evaluation: The GAN_GRU factor demonstrates robust performance across various industries and time periods, indicating its effectiveness in stock selection[2][9] --- Factor Backtesting Results 1. GAN_GRU Factor - IC Mean: 11.57%[37][38] - ICIR: 0.89[38] - Turnover Rate: 0.83[38] - Recent IC: -0.28%[37][38] - 1-Year IC Mean: 11.54%[37][38] - Annualized Return: 36.60%[38] - Annualized Volatility: 24.02%[38] - IR: 1.66[38] - Maximum Drawdown: 27.29%[38] - Annualized Excess Return: 24.89%[38] Industry-Specific Performance - Top 5 Industries by Recent IC (May 2025): - Social Services: 30.15% - Defense & Military: 28.07% - Banking: 25.31% - Computers: 24.86% - Real Estate: 12.07%[39] - Top 5 Industries by 1-Year IC Mean: - Construction & Decoration: 18.54% - Utilities: 18.14% - Communication: 17.37% - Non-Banking Finance: 16.76% - Defense & Military: 16.53%[39] - Top 5 Industries by Recent Excess Return (May 2025): - Retail: 8.22% - Defense & Military: 7.15% - Social Services: 4.58% - Construction & Decoration: 3.91% - Electronics: 3.64%[42] - Top 5 Industries by 1-Year Average Monthly Excess Return: - Oil & Petrochemicals: 5.60% - Building Materials: 5.29% - Home Appliances: 5.06% - Non-Ferrous Metals: 4.57% - Communication: 4.29%[42]