深度学习研究报告：股价预测之多模态多尺度

Quantitative Models and Factor Analysis Summary Quantitative Models and Construction - Model Name: Multi-modal Multi-scale Stock Price Prediction Model Model Construction Idea: The model integrates multi-modal (chart data and time-series data) and multi-scale (different frequency data) features to enhance stock price prediction accuracy. It employs four independent deep time-series models and convolutional models for feature extraction, using both regression and classification losses for end-to-end training[14][17][18]. Model Construction Process: 1. Multi-modal Features: Combines time-series price-volume data and standardized price-volume charts. Time-series models capture abstract numerical relationships, while convolutional models identify chart patterns[17]. 2. Multi-scale Features: Incorporates 1-minute high-frequency data, daily data, and weekly data. High-frequency data is factorized into 55 features, which are then input into time-series models[18]. 3. Lightweight Design: Reduces the parameter size of each sub-model to 1/4 of the initial version, minimizing overfitting and computational resource dependency[18]. 4. Multi-head Output: Outputs include absolute future returns and categorical predictions (up, flat, down), using mean squared error and cross-entropy as loss functions[19]. Model Evaluation: The model demonstrates significant improvements in prediction accuracy and excess returns compared to the initial version[14][17][19]. Model Backtesting Results - RankIC Mean: - All Market: 8.7% - CSI 300: 7.9% - CSI 500: 6.6% - CSI 800: 6.9% - CSI 1000: 8.2% - CNI 2000: 8.7% - ChiNext: 10.4%[21][116] - RankIC Win Rate: - All Market: 86.7% - CSI 300: 69.0% - CSI 500: 73.5% - CSI 800: 75.2% - CSI 1000: 84.8% - CNI 2000: 86.1% - ChiNext: 89.2%[21][116] - Excess Annualized Returns: - All Market: 12.97% - CSI 300: 9.17% - CSI 500: 5.30% - CSI 800: 8.38% - CSI 1000: 7.47% - CNI 2000: 7.47% - ChiNext: 11.52%[21][117] Quantitative Factors and Construction - Factor Name: Model-derived Factor Factor Construction Idea: Derived from the model's predictions, the factor captures both numerical relationships and chart patterns, leveraging multi-modal and multi-scale data[14][17][18]. Factor Construction Process: 1. Predictions from time-series models and convolutional models are combined. 2. Multi-frequency data (1-minute, daily, weekly) is processed to extract features. 3. Factor values are generated based on the model's outputs, including both regression and classification results[14][17][18]. Factor Evaluation: The factor shows low correlation with traditional Barra style factors, indicating its uniqueness[22][23]. Factor Backtesting Results - Correlation with Barra Factors: - Liquidity: -18% - Volatility: -16% - Size: -8%[22][23] - RankIC Mean: - All Market: 8.7% - CSI 300: 7.9% - CSI 500: 6.6% - CSI 800: 6.9% - CSI 1000: 8.2% - CNI 2000: 8.7% - ChiNext: 10.4%[21][116] - RankIC Win Rate: - All Market: 86.7% - CSI 300: 69.0% - CSI 500: 73.5% - CSI 800: 75.2% - CSI 1000: 84.8% - CNI 2000: 86.1% - ChiNext: 89.2%[21][116] - Excess Annualized Returns: - All Market: 12.97% - CSI 300: 9.17% - CSI 500: 5.30% - CSI 800: 8.38% - CSI 1000: 7.47% - CNI 2000: 7.47% - ChiNext: 11.52%[21][117]