Workflow
机器学习系列之九:Mamba-MoE:风险中性化与多模型融合
NORTHEAST SECURITIES·2025-05-29 07:41

Quantitative Models and Construction Methods 1. Model Name: Mamba-MoE - Model Construction Idea: The model integrates linear and nonlinear risk constraints into the training process, aiming to reduce risk exposure while extracting time-series features efficiently. It also employs cross-validation and multi-model ensemble to enhance robustness[2][43][166] - Model Construction Process: 1. Time-Series Feature Extraction: Utilizes the Mamba architecture, which is based on Selective State Space Models (SSM). The SSM equations are: h(t)=Ah(t)+Bx(t),h^{\prime}(t) = A h(t) + B x(t), y(t)=Ch(t)y(t) = C h(t) where A,B,C A, B, C are parameters representing state transition, input, and output matrices, respectively. The discrete form is: ht=Aˉht1+Bˉxt,h_t = \bar{A} h_{t-1} + \bar{B} x_t, yt=Chty_t = C h_t with: Aˉ=exp(ΔA),\bar{A} = \exp(\Delta A), Bˉ=(ΔA)1(exp(ΔA)I)ΔB\bar{B} = (\Delta A)^{-1}(\exp(\Delta A) - I)\Delta B[16][19][20] 2. Nonlinear Risk Constraints: Incorporates nonlinear interactions between risk factors and stock-level relationships (e.g., industry and trading correlations) using graph neural networks (GNN). A heterogeneous graph is constructed with two types of edges: industry-based and high-similarity connections (correlation > 0.7 over a 30-day window)[43][48] 3. Dual-Task Learning: - Task 1: Generates alpha factors by extracting time-series features from stock data - Task 2: Creates nonlinear risk factors by modeling residual risks after industry and style neutralization[43][50] 4. Loss Function: L=MSE(y^,y1)+MSE(r^,y2)+αdR+1i=1dR+1ρ(y^,Ri)2L = MSE(\hat{y}, y_1) + MSE(\hat{r}, y_2) + \frac{\alpha}{d_R + 1} \sum_{i=1}^{d_R + 1} \rho(\hat{y}, R_i)^2 where y^ \hat{y} is the alpha factor, y1 y_1 is the industry-neutralized return, r^ \hat{r} is the nonlinear risk factor, y2 y_2 is the style-neutralized return, and Ri R_i represents risk factors[50] 5. Multi-Model Fusion: Combines models using equal weighting or Mixture of Experts (MoE). MoE dynamically assigns weights to sub-models based on stock and market features, selecting the top K K experts for aggregation[98][99] - Model Evaluation: The model demonstrates superior robustness and reduced risk exposure compared to single-task models. It effectively balances alpha generation and risk control[43][166] --- Model Backtesting Results 1. Mamba-MoE - Rank IC: 13.22% - ICIR: 1.28 - Long-Only Annualized Return: 33.01% - Long-Short Sharpe Ratio: 9.25 - Long-Short Maximum Drawdown: 12.21%[167][103] 2. Mamba-10 (Single Task) - Rank IC: 12.83% - ICIR: 1.28 - Long-Only Annualized Return: 30.06% - Long-Short Sharpe Ratio: 8.44 - Long-Short Maximum Drawdown: 11.71%[56][103] 3. Mamba-5 (Single Task) - Rank IC: 12.75% - ICIR: 1.32 - Long-Only Annualized Return: 31.21% - Long-Short Sharpe Ratio: 9.26 - Long-Short Maximum Drawdown: 11.31%[103] 4. Equal-Weight Fusion - Rank IC: 13.08% - ICIR: 1.29 - Long-Only Annualized Return: 31.76% - Long-Short Sharpe Ratio: 9.30 - Long-Short Maximum Drawdown: 11.97%[103] --- Quantitative Factors and Construction Methods 1. Factor Name: Nonlinear Risk Factor - Factor Construction Idea: Captures residual risks that cannot be explained linearly by existing risk factors, incorporating interactions and stock-level relationships[43][48] - Factor Construction Process: 1. Constructs a heterogeneous graph with two types of edges: industry-based and high-similarity connections (correlation > 0.7 over 30 days) 2. Applies graph convolution to aggregate edge and node information, generating nonlinear risk factors[48] 3. Combines these factors with original risk factors for further risk exposure control[50] - Factor Evaluation: Enhances the interpretability and effectiveness of risk control, reducing exposure to nonlinear risks[43][48] --- Factor Backtesting Results 1. Nonlinear Risk Factor (Dual-Task Model) - Rank IC: 12.83% - ICIR: 1.28 - Long-Only Annualized Return: 30.06% - Long-Short Sharpe Ratio: 8.44 - Long-Short Maximum Drawdown: 11.71%[56][60] 2. Nonlinear Risk Factor (Single Task, Original Label) - Rank IC: 13.09% - ICIR: 1.17 - Long-Only Annualized Return: 31.62% - Long-Short Sharpe Ratio: 7.72 - Long-Short Maximum Drawdown: 14.27%[56][60] 3. Nonlinear Risk Factor (Single Task, Neutralized Label) - Rank IC: 12.92% - ICIR: 1.22 - Long-Only Annualized Return: 29.49% - Long-Short Sharpe Ratio: 8.05 - Long-Short Maximum Drawdown: 12.45%[56][60] --- Index Enhancement Strategy Results 1. CSI 300 Enhancement - Annualized Excess Return: 9.02% - Tracking Error: 4.26% - Excess Sharpe Ratio: 2.12 - Excess Maximum Drawdown: 4.05%[156] 2. CSI 500 Enhancement - Annualized Excess Return: 11.63% - Tracking Error: 4.92% - Excess Sharpe Ratio: 2.36 - Excess Maximum Drawdown: 6.19%[156] 3. CSI 1000 Enhancement - Annualized Excess Return: 17.74% - Tracking Error: 5.52% - Excess Sharpe Ratio: 3.22 - Excess Maximum Drawdown: 6.17%[156]