“学海拾珠”系列之二百二十七:使用深度强化学习解决高维多期环境下的组合配置
Huaan Securities·2025-03-14 08:09

Quantitative Models and Construction Methods Model Name: MP-Adv-DRL-Cor - Model Construction Idea: The model uses convolutional neural networks (CNN) to capture dynamic patterns in asset prices and WaveNet to model cross-asset dependencies. It combines these inputs with deep reinforcement learning (DRL) to solve multi-period Bellman equations for optimal long-term portfolio allocation[2][24][25] - Model Construction Process: - CNN for Dynamic Price Sequence Information Extraction: CNN is used to extract dynamic features from asset price sequences. The process involves several layers including input, convolution, pooling, fully connected, and softmax layers[25][27] - Convolution Layer Formula: \mathbf{y}_{j,n}=\delta\Bigg{(}\mathbf{b}+\sum_{l=0}^{Z}\sum_{m=0}^{M}\varpi_{l,m}\mathbf{x}_{j+l,n+m}^{\prime}\Bigg{)} where $\delta$ is the activation function, $\mathbf{b}$ is the shared bias parameter, $Z$ and $M$ are the length and width of the local receptive field, $\varpi_{l,m}$ are the shared weight parameters, and $\mathbf{x}{j+l,n+m}^{\prime}$ is the input matrix data[25] - Activation Function Formula: δ(x)=ReLU(x)={x,x00,x<0 \delta(x^{\prime})=ReLU(x^{\prime})=\left\{\begin{array}{c}x^{\prime},x^{\prime}\geq0\\ 0,x^{\prime}<0\end{array}\right. ReLU function is used to prevent overfitting and enhance non-linear expression capability[25] - WaveNet for Cross-Asset Dependency Information Extraction: WaveNet captures time-varying dependencies between assets using convolution layers and causal convolutions with dilation operations[28] - WaveNet Dependency Formula: qi,t(rt)=(φ(ri,t)(1ω0T)+j=1Nφ(rj,t)(1ωjT))1+a \mathbf{q}_{i,t}(\mathbf{r}_{t})=\left(\varphi(\mathbf{r}_{i,t})\odot(\mathbf{1}\mathbf{\omega}_{0}^{\mathrm{T}})+\sum_{j=1}^{N}\varphi(\mathbf{r}_{j,t})\odot(\mathbf{1}\mathbf{\omega}_{j}^{\mathrm{T}})\right)\mathbf{1}+\mathbf{a} where $\varphi$ is the neural network function, $\mathbf{a}$ is the bias term, and $\mathbf{r}{t}$ is the input stream[28] - DRL for Multi-Period Portfolio Decision Making: Combines dynamic asset price features and dependency information as inputs for the deterministic policy gradient (DPG) model to optimize portfolio allocation[29] - MDP and Multi-Period Bellman Equation: V_{\pi}(s)=E_{\pi}\left(U_{t,h}\left|s_{t}=s\right.\right)=E_{\pi}\left(\left.\sum_{k=1}^{h}\gamma^{k-1}u_{t+k}\left|s_{t}=s\right.\right) where $V_{\pi}(s)$ is the state value function, $U_{t,h}$ is the multi-period utility function, and $\gamma$ is the time preference parameter[32] - DRL Objective Function: \max_{\theta}J(\pi_{\theta})=\mathbb{E}_{\pi_{\theta}(h,n)}\big{(}U_{t,h}(u_{t+1}(\omega_{t+1}),u_{t+2}(\omega_{t+2}),\ \cdot\ \cdot\ \cdot,u_{t+h}(\omega_{t+h}))\big{)} where $\theta$ are the neural network parameters[36] - Model Evaluation: The MP-Adv-DRL-Cor method generally outperforms other methods in terms of annual return, Sharpe ratio, and maximum drawdown across different holding periods, risk aversion levels, and transaction costs[3][52][56] Model Backtesting Results MP-Adv-DRL-Cor Model Performance Metrics - Holding Period h = 1 - Annual Return: 12.48% (S&P 100), 8.808% (DJIA), 13.58% (S&P/TSX Composite)[53] - Annual Volatility: 24.57% (S&P 100), 21.43% (DJIA), 25.35% (S&P/TSX Composite)[53] - Sharpe Ratio: 0.508 (S&P 100), 0.411 (DJIA), 0.536 (S&P/TSX Composite)[53] - Maximum Drawdown: 39.80% (S&P 100), 35.34% (DJIA), 48.01% (S&P/TSX Composite)[53] - Turnover: 0.007 (S&P 100), 0.007 (DJIA), 0.008 (S&P/TSX Composite)[53] - Holding Period h = 5 - Annual Return: 25.72% (S&P 100), 19.79% (DJIA), 17.91% (S&P/TSX Composite)[53] - Annual Volatility: 28.89% (S&P 100), 33.51% (DJIA), 21.58% (S&P/TSX Composite)[53] - Sharpe Ratio: 0.890 (S&P 100), 0.590 (DJIA), 0.830 (S&P/TSX Composite)[53] - Maximum Drawdown: 39.19% (S&P 100), 51.72% (DJIA), 37.94% (S&P/TSX Composite)[53] - Turnover: 0.087 (S&P 100), 0.080 (DJIA), 0.009 (S&P/TSX Composite)[53] - Holding Period h = 22 - Annual Return: 22.37% (S&P 100), 19.23% (DJIA), 18.42% (S&P/TSX Composite)[53] - Annual Volatility: 37.08% (S&P 100), 32.49% (DJIA), 22.10% (S&P/TSX Composite)[53] - Sharpe Ratio: 0.603 (S&P 100), 0.592 (DJIA), 0.833 (S&P/TSX Composite)[53] - Maximum Drawdown: 44.13% (S&P 100), 52.63% (DJIA), 38.65% (S&P/TSX Composite)[53] - Turnover: 0.099 (S&P 100), 0.103 (DJIA), 0.020 (S&P/TSX Composite)[53] - Holding Period h = 36 - Annual Return: 29.21% (S&P 100), 28.88% (DJIA), 21.38% (S&P/TSX Composite)[53] - Annual Volatility: 36.14% (S&P 100), 34.32% (DJIA), 27.08% (S&P/TSX Composite)[53] - Sharpe Ratio: 0.808 (S&P 100), 0.841 (DJIA), 0.790 (S&P/TSX Composite)[53] - Maximum Drawdown: 32.09% (S&P 100), 44.24% (DJIA), 40.83% (S&P/TSX Composite)[53] - Turnover: 0.170 (S&P 100), 0.109 (DJIA), 0.074 (S&P/TSX Composite)[53] - Holding Period h = 66 - Annual Return: 27.51% (S&P 100), 17.40% (DJIA), 15.50% (S&P/TSX Composite)[53] - Annual Volatility: 39.43% (S&P 100), 31.32% (DJIA), 37.11% (S&P/TSX Composite)[53] - Sharpe Ratio: 0.698 (S&P 100), 0.555 (DJIA), 0.418 (S&P/TSX Composite)[53] - Maximum Drawdown: 66.10% (S&P 100), 36.22% (DJIA), 55.38% (S&P/TSX Composite)[53] - Turnover: 0.107 (S&P 100), 0.092 (DJIA), 0.197 (S&P/TSX Composite)[53] Quantitative Factors and Construction Methods Factor Name: Risk Aversion Coefficient (λ) - Factor Construction Idea: The risk aversion coefficient influences the