自监督学习 - filings, earnings calls, financial reports, news

自监督学习

Search documents

Minsheng Securities· 2025-07-31 09:02

Quantitative Models and Construction Methods 1. Model Name: Deep Learning Stock Return Prediction Model - **Model Construction Idea**: This model is based on a deep learning framework tailored to the current market environment. It integrates daily and minute-frequency inputs to predict stock returns and generate trading signals based on historical rolling thresholds[1][10][22] - **Model Construction Process**: - **Input Layer**: Combines 51 technical/sentiment daily features, 7 basic daily price-volume indicators, 10 enhanced style factors, and 52 minute-frequency features aggregated to daily frequency[22] - **Training Layer**: Utilizes meta-learning to adapt to new market data dynamically, avoiding overfitting to historical data[14] - **Output Layer**: Employs LinSAT neural networks to impose constraints on the output, ensuring specific objectives like controlling style and industry exposures[18] - **Loss Function**: Multi-period mean squared error (MSE) is used to stabilize predictions for timing strategies[22] - **Formula**: Multi-period return prediction as \( y = (n, 1) \), where \( n \) represents the number of stocks[22] - **Model Evaluation**: Demonstrates robustness in adapting to market changes and controlling exposures, with significant predictive power for timing strategies[10][22] 2. Model Name: SimStock - **Model Construction Idea**: SimStock uses self-supervised learning to predict stock similarities, incorporating both static and dynamic correlations. It leverages contrastive learning to dynamically capture time-series information beyond traditional industry and style classifications[2][47][48] - **Model Construction Process**: - **Input**: Past 40-day price-volume data, Barra style factors, and capital flow indicators[52] - **Positive and Negative Sample Construction**: Positive samples are generated as \( X_{pos} = X + (1-\alpha)X_{rand} \), where \( \alpha = 0.75 \) and \( X_{rand} \) is a random feature sample[52] - **Embedding**: LSTM initializes dynamic attention weights, and CLS tokens aggregate sequence information into stock attribute vectors[52] - **Similarity Calculation**: Stock similarity is measured using cosine similarity between attribute vectors[52] - **Model Evaluation**: Effectively identifies stocks with high similarity, primarily within the same industry, but without clear patterns in market capitalization or sub-industry[56] 3. Model Name: Improved GRU Model with SimStock Integration - **Model Construction Idea**: Enhances the GRU-based stock return prediction model by initializing hidden states with SimStock-generated stock attribute vectors, improving stability across different stock types[57][59] - **Model Construction Process**: - **Initialization**: SimStock attribute vectors replace the GRU model's initial hidden state[57] - **Training**: Retains the same training setup as the baseline GRU model, with adjustments to incorporate the new initialization[59] - **Model Evaluation**: Demonstrates improved predictive performance and stability, particularly in timing strategies across diverse stocks[60][63] 4. Model Name: Index Timing Model - **Model Construction Idea**: Aggregates individual stock signals into index signals using weighted predictions based on market capitalization, followed by threshold-based signal generation[77] - **Model Construction Process**: - **Aggregation**: Combines stock return predictions into index return predictions using market-cap weights[77] - **Signal Generation**: Uses the 60th percentile of past-year predictions as the buy threshold and the 40th percentile as the sell threshold[77] - **Holding Period**: Maintains positions for at least 5 trading days to reduce turnover[77] - **Model Evaluation**: Effective in generating excess returns, particularly in high-volatility sectors[79][82][84] --- Model Backtest Results 1. Deep Learning Stock Return Prediction Model - **Cumulative Excess Return**: 77% over 5 years[33] - **Annualized Return**: 27%[33] - **Excess Return vs. Stocks**: 11.3% (pre-cost)[33] 2. SimStock - **Cumulative Excess Return**: 109% over 5 years[60] - **Annualized Return**: 30%[60] - **Excess Return vs. Stocks**: 14.8% (pre-cost)[60] - **Daily Win Rate**: 57.4%[60] - **Holding Probability**: 45.7%[60] 3. Index Timing Model - **HS300**: Annualized Return 5.1%, Excess Return 5.6%, Max Drawdown 7.7%[79] - **CSI500**: Annualized Return 12.4%, Excess Return 12.2%, Max Drawdown 7.1%[82] - **CSI1000**: Annualized Return 15.1%, Excess Return 14.9%, Max Drawdown 11.3%[84] 4. Sector Timing - **Best Sector**: Electric Power Equipment & New Energy, Annualized Return 36%, Excess Return 31.1%[101] --- Quantitative Factors and Construction Methods 1. Factor Name: Reinforced Style Factor (PPO Model) - **Factor Construction Idea**: Uses PPO reinforcement learning to predict market style preferences, generating more interpretable and robust risk factors compared to traditional deep learning[12] - **Factor Construction Process**: - **Input**: Traditional style factors and recent stock price-volume data[12] - **Reward Function**: Stability-penalized market return goodness-of-fit[12] - **Output**: Enhanced style factor representing AI market preferences[12] - **Factor Evaluation**: Provides a stable and interpretable representation of market style dynamics[12] --- Factor Backtest Results 1. Reinforced Style Factor - **RankIC**: Weekly average of 4.5% since 2019[36] - **Annualized Return**: 23.2% for long-only portfolios, Excess Return 18.3% vs. CSI800[36]

何恺明改进了谢赛宁的REPA：极大简化但性能依旧强悍

机器之心· 2025-06-12 09:57

Core Viewpoint - The article discusses the significance of representation learning in generative models, particularly through the introduction of a new method called Dispersive Loss, which integrates self-supervised learning into diffusion-based generative models without requiring additional pre-training or external data sources [6][9][43]. Group 1: Diffusion Models and Representation Learning - Diffusion models excel in modeling complex data distributions but are largely disconnected from the representation learning field [2]. - The training objectives of diffusion models typically focus on reconstruction tasks, such as denoising, lacking explicit regularization for learned representations [3]. - Representation learning, particularly self-supervised learning, is crucial for learning general representations applicable to various downstream tasks [4]. Group 2: Introduction of Dispersive Loss - Dispersive Loss is a flexible and general plug-in regularizer that integrates self-supervised learning into diffusion-based generative models [9]. - The core idea of Dispersive Loss is to introduce a regularization target for the model's internal representations, encouraging them to spread out in the latent space [10][13]. - This method does not require additional layers or parameters, making it a simple and independent approach [15][16]. Group 3: Comparison with Existing Methods - Dispersive Loss operates without the need for pre-training, external data, or additional model parameters, unlike the REPA method, which relies on pre-trained models [7][41][43]. - The new method demonstrates that representation learning can benefit generative modeling without external information sources [13][43]. - In practical applications, introducing Dispersive Loss requires minimal adjustments, such as specifying the intermediate layers for regularization [29]. Group 4: Performance Evaluation - Experimental results show that Dispersive Loss consistently outperforms corresponding contrastive losses while avoiding the complexities of dual-view sampling [33]. - The method has been tested across various models, including DiT and SiT, showing improvements in all scenarios, particularly in larger models where effective regularization is crucial [36][37]. - The article highlights that Dispersive Loss can be generalized for one-step diffusion-based generative models, indicating its versatility [44].

分散损失 (Dispersive Loss)

分散损失 (Dispersive Loss)

AI“化学侦探”快速解析未知分子结构

Ke Ji Ri Bao· 2025-05-28 23:43

Core Insights - An international team led by the Czech Technical University has developed an AI molecular decoder named DreaMS, which can rapidly analyze the structure of unknown molecules, with potential applications in drug development and space life detection [1][2] - The research highlights that known natural molecules represent only a small fraction of what exists, with many undiscovered molecules in plants, soil, and extraterrestrial environments possibly holding keys to new drug formulations and environmental solutions [1] - DreaMS utilizes a groundbreaking learning approach that mimics how human infants learn language, allowing it to autonomously construct a cognitive framework for molecular structure interpretation without prior chemical rules [1] Molecular Analysis Capabilities - DreaMS can estimate the presence of specific molecular fragments or chemical elements, including mastering fluorine detection, which is crucial for modern pharmaceuticals and pesticides [2] - The team trained DreaMS using fluorine-containing samples, overcoming a long-standing detection challenge in the academic community [2]

Jing Ji Guan Cha Wang· 2025-05-27 07:50

Core Insights - A research team from the Institute of Software, Chinese Academy of Sciences, proposed a small-batch data sampling strategy to eliminate the interference of unobservable variable semantics on representation learning, enhancing the out-of-distribution generalization ability of self-supervised learning models [1][2] Group 1: Research Findings - The out-of-distribution generalization ability refers to the model's performance on test data that differs from the training data distribution, which is crucial for maintaining effectiveness in unseen data scenarios [1] - The study identified that self-supervised learning models are affected by unobservable variable semantics during training, which weakens their out-of-distribution generalization ability [1] Group 2: Methodology - The proposed strategy utilizes causal effect estimation techniques to eliminate the confounding effects of unobservable variable semantics [1] - By learning a latent variable model, the strategy estimates the posterior probability distribution of unobservable semantic variables given "anchor" samples, termed as balance scores [1] - Samples with similar or close balance scores are grouped into the same small-batch dataset, ensuring that unobservable semantic variables are conditionally independent of the "anchor" samples within each batch [1] Group 3: Experimental Results - Extensive experiments on benchmark datasets showed that the sampling strategy improved the performance of mainstream self-supervised learning methods by at least 2% across various evaluation tasks [2] - In classification tasks on ImageNet100 and ImageNet, both Top-1 and Top-5 accuracy surpassed the state-of-the-art self-supervised methods [2] - In semi-supervised classification tasks, Top-1 and Top-5 accuracy increased by over 3% and 2%, respectively [2] - The strategy also provided stable gains in average precision for object detection and instance segmentation transfer learning tasks [2] - Performance improvements exceeded 5% for few-shot transfer learning tasks on datasets like Omniglot, miniImageNet, and CIFARFS [2] - The research findings were accepted by the top-tier academic conference in artificial intelligence, International Conference on Machine Learning (ICML-25) [2]

港大马毅团队等开源新作：用编码率正则化重构视觉自监督学习范式，“少即是多”

量子位· 2025-03-08 03:35

Core Viewpoint - The article discusses the introduction of SimDINO and SimDINOv2, two new visual pre-training models developed by a collaboration of researchers from various institutions, which simplify the training process of the existing DINO and DINOv2 models while enhancing their performance [1][5][12]. Group 1: Model Development - SimDINO and SimDINOv2 are designed to address the complexities associated with DINO and DINOv2, which are currently leading models in visual pre-training [2][4]. - The new models utilize coding rate regularization to simplify the training process and improve robustness and performance [12][16]. - The core idea is to remove complex empirical design components from the original DINO and DINOv2 training processes, making the models easier to train and implement [12][18]. Group 2: Methodology - The introduction of coding rate regularization helps prevent representation collapse, which was a significant issue in the original models [14][17]. - SimDINO retains the EMA self-distillation scheme and multi-view data augmentation from DINO but modifies the contrastive learning approach to use Euclidean distance or cosine similarity instead of high-dimensional projections [18][19]. - SimDINOv2 further simplifies the iBOT mechanism introduced in DINOv2, enhancing the model's efficiency [19]. Group 3: Experimental Validation - Extensive experiments on various datasets, including ImageNet-1K, COCO val2017, and ADE20K, demonstrate that SimDINO and SimDINOv2 outperform the DINO series in terms of computational efficiency, training stability, and downstream task performance [22][23]. - In specific evaluations, SimDINO achieved a linear segmentation mIoU of 33.7 and mAcc of 42.8, while SimDINOv2 reached mIoU of 36.9 and mAcc of 46.5, showcasing significant improvements over DINO and DINOv2 [30]. Group 4: Theoretical Insights - The research team proposes a theoretical framework for selecting hyperparameters in SimDINO, focusing on balancing the gradients of the coding rate regularization term and the distance term [33][34]. - This theoretical analysis provides a clearer optimization target and reduces the complexity of hyperparameter tuning, making the training process more straightforward [39]. Group 5: Future Directions - The research team suggests potential improvements for SimDINO, including exploring self-supervised objectives that do not require self-distillation optimization [43].