Quantitative Models and Construction Methods - Model Name: DeepSeek-V3-671B Model Construction Idea: Transition from "black-box" scoring to structured, interpretable scoring for financial reports Model Construction Process: Utilizes structured task rules to avoid "hallucination" outputs, ensuring controlled and consistent information extraction. The model is deployed via local and cloud API for efficient processing. DeepSeek-V3-671B was selected for its superior output format compliance, result stability, and batch processing efficiency[160][157][64] Model Evaluation: Demonstrates high accuracy in structured scoring tasks, with stable results across multiple tests[62][64][160] Quantitative Factors and Construction Methods - Factor Name: Report Scoring Factors Factor Construction Idea: Integrate multiple dimensions such as sentiment categories, text proportion, sentiment density, and category importance weights to predict future returns Factor Construction Process: 1. Simple Weighted and Concentration Adjustment: Formula: $score_mean_hhi = score_mean * 1/\sqrt{0.01 + HHI}$ HHI measures the concentration of sentiment categories in the report[119][121][123] 2. Text Proportion Weighted: Formula: $score_by_len = \sum_{k=1}^{10} cat_sentiment_{k} * cat_len_{k}$ Adjusted with HHI and power function smoothing to avoid overemphasis on single categories[122][123][124] 3. Category Importance Weighted: Formula: $score_by_cat = \sum_{k=1}^{10} cat_sentiment_{k} * cat_w_{k}$ Category weights derived from regression coefficients and significance levels[125][127][128] 4. Combined Text Proportion & Category Importance Weighted: Formula: $score_by_LenCat = \sum_{k=1}^{10} cat_sentiment_{k} * cat_len_{k} * cat_w_{k}$ Adjusted with HHI and power function smoothing for balanced scoring[131][132][134] Factor Evaluation: - Factors like score_by_cat_w3
and score_by_LenCat3
show strong predictive power for short-term and medium-term returns[130][134][151] - Combined factors (score_report_llm
) exhibit balanced performance across multiple metrics, including RankIC, IC victory rate, and annualized excess returns[151][152][153] Factor Backtesting Results - Factor Name: Comprehensive Scoring Factor (score_report_llm
) Backtesting Metrics: - RankIC: 1.77% - IC Victory Rate: 66.2% - Annualized Excess Return: 13.5% - Maximum Drawdown: Controlled within 4% relative to equal-weighted group[151][152][154] Performance Summary: - Strictly monotonic five-group return structure - 100% annual victory rate against CSI 800 since 2020 - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][156] Application of Sentiment Density in Stock Selection - Factor Name: Sentiment Density Factors (profit improvement density
, performance surprise density
) Backtesting Metrics: - Profit Improvement Density (Top30, N=20): Annualized Return: 15.0%, Excess Return: 15.6%, Maximum Drawdown: 27.5% - Performance Surprise Density (Top30, N=40): Annualized Return: 14.2%, Excess Return: 14.8%, Maximum Drawdown: 31.1% Performance Summary: - Short-term signals (N=20) are more effective for profit improvement density - Long-term signals (N=40) are more effective for performance surprise density[96][98][105][106] Sentiment Emphasis Analysis - Factor Name: Sentiment Emphasis (Order & Proportion) Key Findings: - Positive sentiment appearing earlier in the report correlates with stronger pricing effects, especially for categories like "performance surprise" and "shareholder behavior"[112][111][108] - Text proportion positively impacts future returns for categories like "penetration rate" and "policy-driven factors"[118][114][115] - Basic financial categories (e.g., "profit improvement") show weaker signal effectiveness due to their commonality in reports[118][114][115] Summary of Comprehensive Scoring Factor - Factor Name: Comprehensive Scoring Factor (score_report_llm
) Performance Metrics: - RankIC: 1.77% - IC Victory Rate: 66.2% - Annualized Excess Return: 13.5% - Maximum Drawdown: Controlled within 4% relative to equal-weighted group Key Advantages: - Balanced performance across multiple metrics - Strong predictive power for short-term and medium-term returns - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][154][156]
量化研究系列报告之二十三:让情绪“有结构”:大模型如何挖掘研报新价值
Huaan Securities·2025-08-11 14:58