机器学习选股 - filings, earnings calls, financial reports, news

机器学习选股

Search documents

指数成份股定期调整事件系列报告：2025年12月指数成份股调整预测及事件效应跟踪

CMS· 2025-11-14 13:52

- The report utilizes a random forest model to predict the impact of index constituent stock adjustments on individual stocks' excess returns. The model is designed to handle complex, multi-dimensional, and non-linear problems effectively[13][17][24] - The random forest model selects features based on the logic that passive index funds adjust stock weights following index constituent changes, impacting related stocks. Key features include changes in passive fund holdings, stock liquidity, company market capitalization, and stock price trends[13][15][17] - The construction process of the random forest model involves training on historical data to predict excess returns for stocks affected by index adjustments. The model uses feature selection to enhance generalization ability and focuses on short-term impacts post-announcement[13][17][24] - The evaluation of the random forest model indicates its effectiveness in distinguishing the impact of index adjustments on stocks, particularly in sample-out tests. It successfully identifies stocks with significant excess returns or reduced negative effects[13][17][24] - The backtesting results show that stocks added to the CSI 300 index achieved an average excess return of 2.53% within 10 days post-announcement, while stocks added to the CSI 500 index achieved an average excess return of 1.01% in the same period[17][23][24] - Detailed group performance for stocks added to the CSI 300 index shows excess returns of 2.11% (group_1) and 1.48% (group_5) within 10 days, with a mean return of 2.53%. For the CSI 500 index, group_1 achieved 2.29%, group_5 achieved 0.88%, and the mean return was 1.01% within 10 days[23] - For stocks removed from the indices, the model shows reduced negative effects. CSI 300 stocks in group_1 achieved 1.44% within 10 days, while group_5 showed -0.80%, with a mean return of -0.25%. CSI 500 stocks in group_1 achieved 0.28%, group_5 achieved 0.48%, and the mean return was -0.11% within 10 days[31]

机器学习因子选股月报（2025年10月）-20250930

Southwest Securities· 2025-09-30 04:03

- The GAN_GRU factor is based on the GAN_GRU model, which utilizes a Generative Adversarial Network (GAN) for processing volume-price time series features and then uses a GRU model for time series feature encoding to derive the stock selection factor[4][13][14] - The GAN_GRU model includes two GRU layers (GRU(128, 128)) followed by an MLP (256, 64, 64), with the final output prediction return (pRet) used as the stock selection factor[22] - The GAN model consists of a generator and a discriminator. The generator aims to generate data that appears real, while the discriminator aims to distinguish between real and generated data. The generator's loss function is $L_{G} = -\mathbb{E}_{z\sim P_{z}(z)}[\log(D(G(z)))]$[23][24][25] - The discriminator's loss function is $L_{D} = -\mathbb{E}_{x\sim P_{data}(x)}[\log D(x)] - \mathbb{E}_{z\sim P_{z}(z)}[\log(1-D(G(z)))]$[27][28][29] - The GAN_GRU model's training process involves alternating training of the generator and discriminator until convergence[30] - The GAN_GRU factor's performance from January 2019 to September 2025 shows an IC mean of 0.1136, an annualized excess return of 22.58%, and a recent IC of 0.1053 as of September 28, 2025[41][42] - The GAN_GRU factor's IC mean for the past year is 0.0982, with the highest IC values in the coal, building materials, social services, non-bank finance, and food & beverage industries[42][44] - The top-performing long portfolios in September 2025, based on the GAN_GRU factor, include sectors like building materials, steel, social services, coal, and non-bank finance, with excess returns of 5.78%, 5.13%, 1.91%, 1.55%, and 1.21%, respectively[45] - Over the past year, the top-performing long portfolios based on the GAN_GRU factor include home appliances, building materials, food & beverage, utilities, and textiles & apparel, with average monthly excess returns of 5.04%, 4.96%, 3.92%, 3.53%, and 3.10%, respectively[46] - The top stocks in each industry based on the GAN_GRU factor as of September 28, 2025, include companies like Baolaite, Yutaiwei-U, Cangge Mining, Tuowei Information, Hengtong Co., Angang Co., and others[49][50]

广发金融工程研究· 2025-06-20 06:25

Core Viewpoint - The article discusses the increasing application of machine learning in quantitative stock selection, particularly focusing on GBDT and neural network models, as traditional factors have become less effective [1][4]. Group 1: Model Selection - Machine learning has been widely adopted in quantitative stock selection, with GBDT models (including LGBM, XGBoost, and CatBoost) and neural networks (including GRU, TCN, and Transformer) being the primary focus [1]. - GBDT models are effective for handling manually constructed features, while neural networks excel in capturing temporal changes in features [2]. Group 2: Feature Data Preparation - Different model types require different feature types; tree models handle price and fundamental features well, while neural networks perform better with high-frequency data [22][27]. - Feature selection methods, particularly SHAP, can effectively reduce the number of features while maintaining model performance [2][31]. - Standardization of features before feeding them into models is crucial for improving model performance [2][35]. Group 3: Loss Function Adjustment and Prediction Target Processing - Besides the common MSE loss function, investors often use IC as a loss function, with various ranking loss functions showing improved performance [2][37]. - Using cross-sectional normalization helps the model focus on differences in cross-sectional returns, enhancing factor performance [3][50]. Group 4: Machine Learning Models - GBDT is highlighted as a superior algorithm due to its iterative approach of updating target values based on residuals from previous trees [10][11]. - Neural networks, including RNN, LSTM, GRU, CNN, TCN, and Transformer, are discussed for their effectiveness in various domains, particularly in time series prediction [12][19]. Group 5: Index Enhancement Strategies - The article presents the performance of various index enhancement strategies, with the CSI 300 index showing an annualized excess return of 10.03% and a maximum drawdown of -5.42% [3]. - The CSI 500 index strategy has a slightly lower annualized excess return of 8.41% with a maximum drawdown of -10.78%, while the CSI 1000 index strategy shows a more stable performance with an annualized excess return of 11.44% and a maximum drawdown of -7.95% [3].

GF SECURITIES(SZ:000776)