量价指纹模型
Search documents
量化专题报告:“机器学习”选股模型系列研究(一):量价指纹模型的构建与应用初探
GOLDEN SUN SECURITIES· 2026-01-16 13:34
Quantitative Models and Construction Methods - **Model Name**: Volume-Price Fingerprint Model **Model Construction Idea**: Inspired by large language models, the volume-price fingerprint model treats market transaction data as a special "language" and uses self-supervised learning to extract semantic features from intraday volume-price behavior [1][8][9] **Model Construction Process**: 1. **Minute-level Feature Preprocessing**: Select 32 dimensions of minute-level features, including price features (e.g., high, low, close, price position) and transaction features (e.g., turnover, order cancellation, fund flow). Standardize these features to eliminate dimensional and historical volatility effects [1][16][17] - Price feature standardization formula: $$\tilde{p}_{t,d}=\frac{p_{t,d}}{p_{\mathrm{open}}}-1$$ [16] - Transaction feature standardization formula: $${\tilde{f}}_{t,d}={\frac{f_{t,d}}{S_{d}}},\quad S_{d}={\frac{1}{N_{\mathrm{hist}}}}\sum_{i=1}^{N_{\mathrm{hist}}}\sum_{t=1}^{T}f_{t,d}^{(i)}$$ [17] 2. **Dual-task Self-supervised Learning Framework**: - **Forward Causal Prediction Task**: Predict price features causally using past transaction and price information. A triangular attention mask ensures strict causality [18][21] - **Backward Feature Reconstruction Task**: Randomly mask transaction features and reconstruct them using global sequence information [18][22] 3. **Anti-collapse Regularization**: Introduce diversity, orthogonality, and uniformity regularization terms to ensure high differentiation, low redundancy, and rich information in fingerprint vectors [1][43][44][46] - Total loss function: $${\mathcal{L}}_{\mathrm{total}}=\lambda_{f}{\mathcal{L}}_{\mathrm{forward}}+\lambda_{b}{\mathcal{L}}_{\mathrm{backward}}+{\mathcal{L}}_{\mathrm{diversity}}+{\mathcal{L}}_{\mathrm{orthogonality}}+{\mathcal{L}}_{\mathrm{uniformity}}$$ [47] **Model Evaluation**: The model provides a structured representation of market dynamics, capturing semantic features beyond traditional numerical predictions [1][9][14] - **Model Name**: GRU Model with Volume-Price Fingerprint **Model Construction Idea**: Use GRU to predict future stock returns based on volume-price fingerprint features [2][50][51] **Model Construction Process**: 1. Input features include 128-dimensional volume-price fingerprint vectors and basic daily features (e.g., open, high, low, close, volume, turnover) [51][52] 2. GRU structure: Two-layer GRU + fully connected layers + LayerNorm + Relu + dropout + fully connected layers [53] 3. Training details: Batch size of 512, learning rate of 1e-4 with warmup, early stopping after 10 rounds without improvement [53][54] **Model Evaluation**: The GRU model effectively utilizes the semantic features of volume-price fingerprints for stock prediction, outperforming traditional factor-based models in certain metrics [2][50][54] - **Model Name**: Dual-stream GRU Model **Model Construction Idea**: Combine volume-price fingerprints and traditional volume-price factors using dual-stream GRU to leverage complementary information [67][68] **Model Construction Process**: 1. Separate GRU streams for volume-price fingerprints and traditional factors, followed by feature fusion through configurable weights [67][69] 2. Training details: Similar to single-stream GRU, with parallel training using different random seeds for robustness [68][69] **Model Evaluation**: The dual-stream GRU model improves prediction accuracy and stability, reducing overfitting risks associated with single data sources [68][69] Model Backtesting Results - **Volume-Price Fingerprint Model**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **GRU Model with Volume-Price Fingerprint**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **Dual-stream GRU Model**: - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Quantitative Factors and Construction Methods - **Factor Name**: Volume-Price Fingerprint Factor **Factor Construction Idea**: Extract semantic features from intraday volume-price behavior using self-supervised learning [1][9][14] **Factor Construction Process**: 1. Generate 128-dimensional daily fingerprint vectors using the volume-price fingerprint model [1][16][18] 2. Ensure high differentiation, low redundancy, and rich information through anti-collapse regularization [43][44][46] **Factor Evaluation**: The fingerprint factor captures hidden market patterns and semantic information, complementing traditional factors [1][9][14] Factor Backtesting Results - **Volume-Price Fingerprint Factor**: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - **Fusion Factor (Volume-Price Fingerprint + Traditional Factors)**: - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Index Enhancement Results - **CSI 300 Enhanced Portfolio**: - Annualized Return: 11.00% - Excess Annualized Return: 7.12% - Tracking Error: 1.74% - IR: 4.10 - Monthly Win Rate: 86.11% - Max Drawdown: 1.85% [75][77] - **CSI 500 Enhanced Portfolio**: - Annualized Return: 13.32% - Excess Annualized Return: 11.38% - Tracking Error: 3.47% - IR: 3.28 - Monthly Win Rate: 83.33% - Max Drawdown: 4.76% [78][80] - **CSI 1000 Enhanced Portfolio**: - Annualized Return: 13.23% - Excess Annualized Return: 14.84% - Tracking Error: 3.45% - IR: 4.30 - Monthly Win Rate: 83.33% - Max Drawdown: 2.95% [82][83]