情感分析

Search documents
量化研究系列报告之二十三:让情绪“有结构”:大模型如何挖掘研报新价值
Huaan Securities· 2025-08-11 14:58
Quantitative Models and Construction Methods - **Model Name**: DeepSeek-V3-671B **Model Construction Idea**: Transition from "black-box" scoring to structured, interpretable scoring for financial reports **Model Construction Process**: Utilizes structured task rules to avoid "hallucination" outputs, ensuring controlled and consistent information extraction. The model is deployed via local and cloud API for efficient processing. DeepSeek-V3-671B was selected for its superior output format compliance, result stability, and batch processing efficiency[160][157][64] **Model Evaluation**: Demonstrates high accuracy in structured scoring tasks, with stable results across multiple tests[62][64][160] Quantitative Factors and Construction Methods - **Factor Name**: Report Scoring Factors **Factor Construction Idea**: Integrate multiple dimensions such as sentiment categories, text proportion, sentiment density, and category importance weights to predict future returns **Factor Construction Process**: 1. **Simple Weighted and Concentration Adjustment**: Formula: $score\_mean\_hhi = score\_mean * 1/\sqrt{0.01 + HHI}$ HHI measures the concentration of sentiment categories in the report[119][121][123] 2. **Text Proportion Weighted**: Formula: $score\_by\_len = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_len_{k}$ Adjusted with HHI and power function smoothing to avoid overemphasis on single categories[122][123][124] 3. **Category Importance Weighted**: Formula: $score\_by\_cat = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_w_{k}$ Category weights derived from regression coefficients and significance levels[125][127][128] 4. **Combined Text Proportion & Category Importance Weighted**: Formula: $score\_by\_LenCat = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_len_{k} * cat\_w_{k}$ Adjusted with HHI and power function smoothing for balanced scoring[131][132][134] **Factor Evaluation**: - Factors like `score_by_cat_w3` and `score_by_LenCat3` show strong predictive power for short-term and medium-term returns[130][134][151] - Combined factors (`score_report_llm`) exhibit balanced performance across multiple metrics, including RankIC, IC victory rate, and annualized excess returns[151][152][153] Factor Backtesting Results - **Factor Name**: Comprehensive Scoring Factor (`score_report_llm`) **Backtesting Metrics**: - RankIC: 1.77% - IC Victory Rate: 66.2% - Annualized Excess Return: 13.5% - Maximum Drawdown: Controlled within 4% relative to equal-weighted group[151][152][154] **Performance Summary**: - Strictly monotonic five-group return structure - 100% annual victory rate against CSI 800 since 2020 - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][156] Application of Sentiment Density in Stock Selection - **Factor Name**: Sentiment Density Factors (`profit improvement density`, `performance surprise density`) **Backtesting Metrics**: - Profit Improvement Density (Top30, N=20): Annualized Return: 15.0%, Excess Return: 15.6%, Maximum Drawdown: 27.5% - Performance Surprise Density (Top30, N=40): Annualized Return: 14.2%, Excess Return: 14.8%, Maximum Drawdown: 31.1% **Performance Summary**: - Short-term signals (N=20) are more effective for profit improvement density - Long-term signals (N=40) are more effective for performance surprise density[96][98][105][106] Sentiment Emphasis Analysis - **Factor Name**: Sentiment Emphasis (Order & Proportion) **Key Findings**: - Positive sentiment appearing earlier in the report correlates with stronger pricing effects, especially for categories like "performance surprise" and "shareholder behavior"[112][111][108] - Text proportion positively impacts future returns for categories like "penetration rate" and "policy-driven factors"[118][114][115] - Basic financial categories (e.g., "profit improvement") show weaker signal effectiveness due to their commonality in reports[118][114][115] Summary of Comprehensive Scoring Factor - **Factor Name**: Comprehensive Scoring Factor (`score_report_llm`) **Performance Metrics**: - RankIC: 1.77% - IC Victory Rate: 66.2% - Annualized Excess Return: 13.5% - Maximum Drawdown: Controlled within 4% relative to equal-weighted group **Key Advantages**: - Balanced performance across multiple metrics - Strong predictive power for short-term and medium-term returns - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][154][156]
股吧散户评论是股市的晴雨表吗?
NORTHEAST SECURITIES· 2025-06-25 07:12
Core Insights - The report investigates whether retail investor comments on stock forums serve as a barometer for market sentiment, particularly focusing on the Shanghai Composite Index [1][10] - It employs sentiment analysis techniques, including BERT model and sentiment lexicon methods, to analyze the emotional tone of investor comments and their potential correlation with market trends [1][11] Group 1: Investor Sentiment Analysis - Comments are categorized into "bullish," "bearish," and "neutral," with bearish comments generally outnumbering bullish ones, indicating that retail investors tend to express negative sentiments during poor market conditions [2][58] - The analysis reveals a logical relationship between sentiment indicators derived from comments and the Shanghai Composite Index during years of significant market fluctuations, although this relationship lacks consistent stability across different years [2][3] Group 2: Methodology and Data Processing - The report utilizes natural language processing (NLP) techniques to analyze investor comments, highlighting the importance of sentiment analysis in understanding market dynamics [10][11] - Data is sourced from the Eastmoney website's Shanghai Composite Index forum, with a focus on comments that reflect genuine retail investor sentiment, filtered to retain approximately 5 million relevant comments over nearly a decade [34][37] Group 3: BERT Model Application - The BERT model is employed to classify the sentiment of comments, achieving an overall accuracy of 88% across different sentiment categories, with specific precision and recall metrics for each category [54][53] - The sentiment scores derived from the BERT model indicate that retail investor sentiment often reacts to current market prices rather than predicting future trends, suggesting a reactive rather than proactive investment behavior [3][67] Group 4: Sentiment Lexicon Analysis - The sentiment lexicon method complements the BERT analysis by quantifying emotional tendencies based on predefined financial sentiment words, further confirming the predominance of bearish sentiment among retail investors [69][75] - The report emphasizes that sentiment indicators derived from both methods reflect a similar trend, with bearish comments consistently outnumbering bullish ones, particularly during market downturns [79][78]