情感分析

Search documents
 量化研究系列报告之二十三:让情绪“有结构”:大模型如何挖掘研报新价值
 Huaan Securities· 2025-08-11 14:58
 Quantitative Models and Construction Methods - **Model Name**: DeepSeek-V3-671B     **Model Construction Idea**: Transition from "black-box" scoring to structured, interpretable scoring for financial reports     **Model Construction Process**: Utilizes structured task rules to avoid "hallucination" outputs, ensuring controlled and consistent information extraction. The model is deployed via local and cloud API for efficient processing. DeepSeek-V3-671B was selected for its superior output format compliance, result stability, and batch processing efficiency[160][157][64]     **Model Evaluation**: Demonstrates high accuracy in structured scoring tasks, with stable results across multiple tests[62][64][160]   Quantitative Factors and Construction Methods - **Factor Name**: Report Scoring Factors     **Factor Construction Idea**: Integrate multiple dimensions such as sentiment categories, text proportion, sentiment density, and category importance weights to predict future returns     **Factor Construction Process**:     1. **Simple Weighted and Concentration Adjustment**:        Formula:        $score\_mean\_hhi = score\_mean * 1/\sqrt{0.01 + HHI}$        HHI measures the concentration of sentiment categories in the report[119][121][123]     2. **Text Proportion Weighted**:        Formula:        $score\_by\_len = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_len_{k}$        Adjusted with HHI and power function smoothing to avoid overemphasis on single categories[122][123][124]     3. **Category Importance Weighted**:        Formula:        $score\_by\_cat = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_w_{k}$        Category weights derived from regression coefficients and significance levels[125][127][128]     4. **Combined Text Proportion & Category Importance Weighted**:        Formula:        $score\_by\_LenCat = \sum_{k=1}^{10} cat\_sentiment_{k} * cat\_len_{k} * cat\_w_{k}$        Adjusted with HHI and power function smoothing for balanced scoring[131][132][134]     **Factor Evaluation**:        - Factors like `score_by_cat_w3` and `score_by_LenCat3` show strong predictive power for short-term and medium-term returns[130][134][151]        - Combined factors (`score_report_llm`) exhibit balanced performance across multiple metrics, including RankIC, IC victory rate, and annualized excess returns[151][152][153]   Factor Backtesting Results - **Factor Name**: Comprehensive Scoring Factor (`score_report_llm`)     **Backtesting Metrics**:        - RankIC: 1.77%        - IC Victory Rate: 66.2%        - Annualized Excess Return: 13.5%        - Maximum Drawdown: Controlled within 4% relative to equal-weighted group[151][152][154]     **Performance Summary**:        - Strictly monotonic five-group return structure        - 100% annual victory rate against CSI 800 since 2020        - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][156]     Application of Sentiment Density in Stock Selection - **Factor Name**: Sentiment Density Factors (`profit improvement density`, `performance surprise density`)     **Backtesting Metrics**:        - Profit Improvement Density (Top30, N=20): Annualized Return: 15.0%, Excess Return: 15.6%, Maximum Drawdown: 27.5%        - Performance Surprise Density (Top30, N=40): Annualized Return: 14.2%, Excess Return: 14.8%, Maximum Drawdown: 31.1%     **Performance Summary**:        - Short-term signals (N=20) are more effective for profit improvement density        - Long-term signals (N=40) are more effective for performance surprise density[96][98][105][106]     Sentiment Emphasis Analysis - **Factor Name**: Sentiment Emphasis (Order & Proportion)     **Key Findings**:        - Positive sentiment appearing earlier in the report correlates with stronger pricing effects, especially for categories like "performance surprise" and "shareholder behavior"[112][111][108]        - Text proportion positively impacts future returns for categories like "penetration rate" and "policy-driven factors"[118][114][115]        - Basic financial categories (e.g., "profit improvement") show weaker signal effectiveness due to their commonality in reports[118][114][115]     Summary of Comprehensive Scoring Factor - **Factor Name**: Comprehensive Scoring Factor (`score_report_llm`)     **Performance Metrics**:        - RankIC: 1.77%        - IC Victory Rate: 66.2%        - Annualized Excess Return: 13.5%        - Maximum Drawdown: Controlled within 4% relative to equal-weighted group     **Key Advantages**:        - Balanced performance across multiple metrics        - Strong predictive power for short-term and medium-term returns        - Low correlation with traditional factors, indicating its potential as an alternative factor[151][152][154][156]
 股吧散户评论是股市的晴雨表吗?
 NORTHEAST SECURITIES· 2025-06-25 07:12
 Core Insights - The report investigates whether retail investor comments on stock forums serve as a barometer for market sentiment, particularly focusing on the Shanghai Composite Index [1][10] - It employs sentiment analysis techniques, including BERT model and sentiment lexicon methods, to analyze the emotional tone of investor comments and their potential correlation with market trends [1][11]   Group 1: Investor Sentiment Analysis - Comments are categorized into "bullish," "bearish," and "neutral," with bearish comments generally outnumbering bullish ones, indicating that retail investors tend to express negative sentiments during poor market conditions [2][58] - The analysis reveals a logical relationship between sentiment indicators derived from comments and the Shanghai Composite Index during years of significant market fluctuations, although this relationship lacks consistent stability across different years [2][3]   Group 2: Methodology and Data Processing - The report utilizes natural language processing (NLP) techniques to analyze investor comments, highlighting the importance of sentiment analysis in understanding market dynamics [10][11] - Data is sourced from the Eastmoney website's Shanghai Composite Index forum, with a focus on comments that reflect genuine retail investor sentiment, filtered to retain approximately 5 million relevant comments over nearly a decade [34][37]   Group 3: BERT Model Application - The BERT model is employed to classify the sentiment of comments, achieving an overall accuracy of 88% across different sentiment categories, with specific precision and recall metrics for each category [54][53] - The sentiment scores derived from the BERT model indicate that retail investor sentiment often reacts to current market prices rather than predicting future trends, suggesting a reactive rather than proactive investment behavior [3][67]   Group 4: Sentiment Lexicon Analysis - The sentiment lexicon method complements the BERT analysis by quantifying emotional tendencies based on predefined financial sentiment words, further confirming the predominance of bearish sentiment among retail investors [69][75] - The report emphasizes that sentiment indicators derived from both methods reflect a similar trend, with bearish comments consistently outnumbering bullish ones, particularly during market downturns [79][78]
