量化专题报告:“机器学习”选股模型系列研究(一):量价指纹模型的构建与应用初探
GOLDEN SUN SECURITIES·2026-01-16 13:34

Quantitative Models and Construction Methods - Model Name: Volume-Price Fingerprint Model Model Construction Idea: Inspired by large language models, the volume-price fingerprint model treats market transaction data as a special "language" and uses self-supervised learning to extract semantic features from intraday volume-price behavior [1][8][9] Model Construction Process: 1. Minute-level Feature Preprocessing: Select 32 dimensions of minute-level features, including price features (e.g., high, low, close, price position) and transaction features (e.g., turnover, order cancellation, fund flow). Standardize these features to eliminate dimensional and historical volatility effects [1][16][17] - Price feature standardization formula: p~t,d=pt,dpopen1\tilde{p}_{t,d}=\frac{p_{t,d}}{p_{\mathrm{open}}}-1 [16] - Transaction feature standardization formula: f~t,d=ft,dSd,Sd=1Nhisti=1Nhistt=1Tft,d(i){\tilde{f}}_{t,d}={\frac{f_{t,d}}{S_{d}}},\quad S_{d}={\frac{1}{N_{\mathrm{hist}}}}\sum_{i=1}^{N_{\mathrm{hist}}}\sum_{t=1}^{T}f_{t,d}^{(i)} [17] 2. Dual-task Self-supervised Learning Framework: - Forward Causal Prediction Task: Predict price features causally using past transaction and price information. A triangular attention mask ensures strict causality [18][21] - Backward Feature Reconstruction Task: Randomly mask transaction features and reconstruct them using global sequence information [18][22] 3. Anti-collapse Regularization: Introduce diversity, orthogonality, and uniformity regularization terms to ensure high differentiation, low redundancy, and rich information in fingerprint vectors [1][43][44][46] - Total loss function: Ltotal=λfLforward+λbLbackward+Ldiversity+Lorthogonality+Luniformity{\mathcal{L}}_{\mathrm{total}}=\lambda_{f}{\mathcal{L}}_{\mathrm{forward}}+\lambda_{b}{\mathcal{L}}_{\mathrm{backward}}+{\mathcal{L}}_{\mathrm{diversity}}+{\mathcal{L}}_{\mathrm{orthogonality}}+{\mathcal{L}}_{\mathrm{uniformity}} [47] Model Evaluation: The model provides a structured representation of market dynamics, capturing semantic features beyond traditional numerical predictions [1][9][14] - Model Name: GRU Model with Volume-Price Fingerprint Model Construction Idea: Use GRU to predict future stock returns based on volume-price fingerprint features [2][50][51] Model Construction Process: 1. Input features include 128-dimensional volume-price fingerprint vectors and basic daily features (e.g., open, high, low, close, volume, turnover) [51][52] 2. GRU structure: Two-layer GRU + fully connected layers + LayerNorm + Relu + dropout + fully connected layers [53] 3. Training details: Batch size of 512, learning rate of 1e-4 with warmup, early stopping after 10 rounds without improvement [53][54] Model Evaluation: The GRU model effectively utilizes the semantic features of volume-price fingerprints for stock prediction, outperforming traditional factor-based models in certain metrics [2][50][54] - Model Name: Dual-stream GRU Model Model Construction Idea: Combine volume-price fingerprints and traditional volume-price factors using dual-stream GRU to leverage complementary information [67][68] Model Construction Process: 1. Separate GRU streams for volume-price fingerprints and traditional factors, followed by feature fusion through configurable weights [67][69] 2. Training details: Similar to single-stream GRU, with parallel training using different random seeds for robustness [68][69] Model Evaluation: The dual-stream GRU model improves prediction accuracy and stability, reducing overfitting risks associated with single data sources [68][69] Model Backtesting Results - Volume-Price Fingerprint Model: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - GRU Model with Volume-Price Fingerprint: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - Dual-stream GRU Model: - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Quantitative Factors and Construction Methods - Factor Name: Volume-Price Fingerprint Factor Factor Construction Idea: Extract semantic features from intraday volume-price behavior using self-supervised learning [1][9][14] Factor Construction Process: 1. Generate 128-dimensional daily fingerprint vectors using the volume-price fingerprint model [1][16][18] 2. Ensure high differentiation, low redundancy, and rich information through anti-collapse regularization [43][44][46] Factor Evaluation: The fingerprint factor captures hidden market patterns and semantic information, complementing traditional factors [1][9][14] Factor Backtesting Results - Volume-Price Fingerprint Factor: - Weekly RankIC Mean: 0.106 - Annualized Return (10-group long-short): 83.88% - IR: 5.41 - Weekly Win Rate: 73.87% - Max Drawdown: 11.65% [2][59][65] - Fusion Factor (Volume-Price Fingerprint + Traditional Factors): - Weekly RankIC Mean: 0.109 - Annualized Return (10-group long-short): 90.89% - IR: 5.95 - Weekly Win Rate: 76.46% - Max Drawdown: 11.54% [68][74] Index Enhancement Results - CSI 300 Enhanced Portfolio: - Annualized Return: 11.00% - Excess Annualized Return: 7.12% - Tracking Error: 1.74% - IR: 4.10 - Monthly Win Rate: 86.11% - Max Drawdown: 1.85% [75][77] - CSI 500 Enhanced Portfolio: - Annualized Return: 13.32% - Excess Annualized Return: 11.38% - Tracking Error: 3.47% - IR: 3.28 - Monthly Win Rate: 83.33% - Max Drawdown: 4.76% [78][80] - CSI 1000 Enhanced Portfolio: - Annualized Return: 13.23% - Excess Annualized Return: 14.84% - Tracking Error: 3.45% - IR: 4.30 - Monthly Win Rate: 83.33% - Max Drawdown: 2.95% [82][83]

量化专题报告:“机器学习”选股模型系列研究(一):量价指纹模型的构建与应用初探 - Reportify