流形约束超连接(mHC)
Search documents
机器学习系列之一:mHC对Barra机器学习因子的改进
NORTHEAST SECURITIES· 2026-01-05 06:41
Quantitative Models and Construction Methods Model Name: mHC-MLP - **Model Construction Idea**: The mHC-MLP model introduces manifold-constrained hyper-connections (mHC) into the traditional MLP framework to address issues such as low signal-to-noise ratio, non-stationarity, and extreme tail behavior in financial data. It achieves this by incorporating multi-stream residual channels, gated fan-in/fan-out mappings, and doubly stochastic manifold projections (via Sinkhorn-Knopp) to enhance numerical stability and extrapolation resistance[1][16][22]. - **Model Construction Process**: 1. **Multi-Stream Residual Channels**: The model expands the single residual channel in traditional ResNet to multiple parallel sub-streams, allowing independent feature representations and dynamic routing between streams[19][20]. 2. **Manifold Constraints**: - Residual mixing matrices are constrained to the Birkhoff polytope (doubly stochastic matrices), ensuring non-negativity, row sums of 1, and column sums of 1. This is achieved using the Sinkhorn-Knopp algorithm during training[22][23][54]. - Fan-in and fan-out mappings are constrained to non-negative values using sigmoid functions, ensuring that output features remain within the convex hull of input features[24]. 3. **Dynamic Routing Mechanism**: The model uses a combination of linear mixing (via residual matrices) and non-linear transformations (via MLP blocks) to balance feature interaction and noise suppression[49][50][51]. 4. **Deep Stacking**: The mHC-MLP extends the network depth to six layers, leveraging the numerical stability provided by manifold constraints to capture higher-order interactions[56][57]. 5. **Initialization and Regularization**: Parameters are initialized with minimal values (e.g., alpha = 0.01) to ensure stable gradient flow during early training stages. Regularization is achieved through manifold constraints rather than traditional dropout or L2 regularization[25][55]. - **Model Evaluation**: The mHC-MLP model demonstrates improved numerical stability, reduced overfitting, and enhanced robustness against noise. However, it may underperform in short-term, high-volatility scenarios due to its conservative nature[2][75][86]. --- Model Backtesting Results mHC-MLP Model - **Cumulative Return**: 49% (compared to 56% for the unconstrained MLP model)[75] - **t-Statistic**: Not explicitly mentioned for mHC-MLP - **IC_IR**: Not explicitly mentioned for mHC-MLP - **Turnover**: Lower than the unconstrained MLP model, indicating better stability[2][75] - **Maximum Drawdown**: Lower than the unconstrained MLP model, reflecting reduced risk exposure[2][75] --- Quantitative Factors and Construction Methods Factor Name: Barra MLP Factor - **Factor Construction Idea**: The Barra MLP factor leverages neural networks to capture non-linear interactions and complex relationships between Barra style factors and residual stock returns, overcoming the limitations of traditional linear factor models[30][31]. - **Factor Construction Process**: 1. **Baseline Risk Model**: A long-term risk model is constructed using the Barra CNE6 framework, incorporating one country factor, 31 industry factors, and 15 style factors (e.g., size, beta, momentum, value)[36][37][38]. 2. **Residual Return Extraction**: Stock returns are decomposed into common factor contributions and residual returns via cross-sectional regression. The residual returns serve as the prediction target for the MLP model[40]. 3. **Rolling Training**: The MLP model is trained using rolling windows of 24, 36, and 72 months to balance bias and variance. Features include the 15 style factors, and the target is the next-period residual return[41]. 4. **Multi-Period Signal Synthesis**: Predictions from the three training windows are standardized (Z-score) and combined using equal weighting or IC-based weighting to generate a composite factor[42][43]. 5. **Orthogonalization**: The composite factor is regressed against the 15 style factors to remove linear correlations, ensuring it provides incremental information[44]. 6. **Pure Factor Return Calculation**: The orthogonalized factor is incorporated into an enhanced Barra risk model, and its pure factor return is estimated via cross-sectional regression[45]. - **Factor Evaluation**: The Barra MLP factor effectively captures non-linear alpha signals and demonstrates significant cumulative returns and IC_IR values, validating its utility in quantitative strategies[46]. --- Factor Backtesting Results Barra MLP Factor - **Cumulative Return**: Over 15%[46] - **t-Statistic**: 2.8[46] - **IC_IR**: 0.45[46] - **Turnover**: Not explicitly mentioned - **Maximum Drawdown**: Not explicitly mentioned --- Composite Model: mHC-Enhanced Barra MLP Factor - **Model Construction Idea**: The mHC-enhanced Barra MLP factor integrates the mHC architecture into the Barra MLP framework to improve robustness and stability while retaining the ability to capture non-linear interactions[48]. - **Model Construction Process**: The MLP core in the Barra MLP factor is replaced with the mHC-MLP architecture, maintaining the same input features, target variables, and training framework. This modification introduces manifold constraints and dynamic routing to enhance numerical stability and reduce overfitting[48][49][50]. - **Model Evaluation**: While the mHC-enhanced factor demonstrates superior stability and robustness, it may lag in short-term, high-volatility markets due to its conservative design[75][86]. --- Composite Model Backtesting Results mHC-Enhanced Barra MLP Factor - **Cumulative Return**: Not explicitly mentioned - **t-Statistic**: Not explicitly mentioned - **IC_IR**: Not explicitly mentioned - **Turnover**: Lower than the original Barra MLP factor[2][75] - **Maximum Drawdown**: Lower than the original Barra MLP factor[2][75]
科技题材开年大狂欢!中概股化身“金龙傲天”
财联社· 2026-01-02 23:37
Market Overview - On the first trading day of 2026, US stock indices showed relatively calm closing fluctuations, with significant inflows into tech stocks and a collective surge in Chinese concept stocks, buoyed by a strong start in the Hong Kong market [1][3] - The S&P 500 index rose by 0.19% to 6858.47 points, the Nasdaq Composite fell by 0.03% to 23235.63 points, and the Dow Jones Industrial Average increased by 0.66% to 48382.39 points [1] Chinese Tech Stocks Performance - The Nasdaq China Golden Dragon Index surged by 4.38%, marking the largest single-day increase since May 12 of the previous year [3] - Notable Chinese tech stocks included Baidu, which rose by 15.03%, Alibaba up by 6.25%, Tencent ADR increasing by 5.23%, and Netease rising by 7.22% [3] AI Sector Developments - Investors are eagerly awaiting developments from DeepSeek, which recently published a paper on a new training method called "manifold constraint hyperconnection" (mHC), seen as a significant breakthrough [3] - This has led to speculation about the release of the next-generation flagship model from DeepSeek, reminiscent of past AI advancements [3] Performance of Major Tech Companies - Major tech companies had mixed performances, with Nvidia rising by 1.26% and Apple falling by 0.31%. Tesla experienced a decline for the seventh consecutive day [5] - The overall performance of tech giants was lackluster, with Microsoft down by 2.21% and Amazon down by 1.87% [5] Stock Movements in Various Sectors - Various thematic stocks saw significant trading activity, with Micron Technology up by 10.51% and Western Digital up by 8.96%, both reaching historical highs [6] - AI energy and storage concept stocks also performed well, with Bloom Energy rising by 13.58% and NuScale Power increasing by 15.17% [6] Electric Vehicle Market Update - Tesla reported Q4 delivery data that fell short of expectations, delivering 418,227 vehicles in the last quarter and losing its title as the global leader in electric vehicle sales to BYD, which sold 2.2567 million vehicles in 2025, a 27.86% increase from 2024 [7][8] Berkshire Hathaway Insights - Warren Buffett expressed confidence in Berkshire Hathaway's long-term prospects, suggesting a high likelihood of the company existing in a hundred years, with praise for his successor Greg Abel's capabilities [9] Retail Investor Performance - A report indicated that retail investors at Interactive Brokers achieved an average return of 19.2% in 2025, outperforming the S&P 500 index's return of 16.39% [9]
解读 | 梁文锋新年王炸:让 AI 从爬楼梯变开高速
未可知人工智能研究院· 2026-01-01 16:04
Core Viewpoint - The article discusses the recent breakthrough by DeepSeek in AI architecture with the introduction of the mHC (manifold-constrained hyperconnection) framework, which enhances efficiency and performance in AI models while using fewer resources compared to traditional methods [2][18]. Group 1: Technical Insights - The mHC framework represents a significant innovation in AI architecture, allowing for more efficient information flow in models [2][14]. - DeepSeek's approach contrasts with traditional methods by implementing a multi-lane highway model for information processing, which requires strict traffic rules to prevent chaos in data flow [14][15]. - The new architecture has shown to improve performance significantly with only a 7% increase in training time on a model with 27 billion parameters [16]. Group 2: Market Implications - Internationally, DeepSeek's innovative approach poses a challenge to major players like OpenAI and Google, who rely on brute force methods of increasing computational power and data [19][20]. - Domestically, competitors such as Kimi and Doubao face pressure as DeepSeek's architectural innovations set a new standard for AI development, shifting investor focus towards companies with genuine technological advantages [23][27]. - The article highlights a shift in valuation logic for AI companies, emphasizing the importance of foundational technological innovation over user numbers or funding [27]. Group 3: Strategic Considerations - DeepSeek's focus on foundational architecture may be seen as a strategic choice, prioritizing core capabilities before expanding into multimodal applications [28]. - The article suggests that while DeepSeek has a narrower focus compared to competitors, this could lead to a stronger long-term competitive advantage [28]. Group 4: Lessons for Individuals - The article emphasizes the importance of specialization and efficiency over scale, suggesting that success in AI and other fields comes from deep focus and innovative problem-solving [31][32]. - It also points out that foundational skills and capabilities are crucial for long-term success, akin to DeepSeek's focus on improving basic model architecture [34].
DeepSeek,最新发布!
券商中国· 2026-01-01 12:40
Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) to address the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Research and Development - The paper highlights that recent advancements in hyperconnections (HC) have broadened the residual flow width and diversified connection patterns, enhancing the widely adopted residual connection paradigm established over the past decade. However, these improvements have weakened the inherent identity mapping characteristics of residual connections, leading to severe training instability and limited scalability, along with significant memory access overhead [3]. - To tackle these challenges, DeepSeek proposed the mHC framework, which projects the HC residual connection space onto a specific manifold, thereby restoring the identity mapping characteristics and integrating strict infrastructure optimizations to ensure operational efficiency [3]. Experimental Results - Internal large-scale training results indicate that mHC effectively supports scalable training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4]. Conclusion and Future Directions - The conclusion of the paper states that empirical results demonstrate mHC's ability to effectively restore identity mapping characteristics, achieving stable large-scale training with superior scalability compared to traditional HC. Importantly, mHC implements these improvements with negligible computational overhead through efficient infrastructure-level optimizations [6]. - As a generalized extension of the HC paradigm, mHC opens up several important research directions for the future. While this study utilized a double random matrix to ensure stability, the framework is compatible with various manifold constraints designed for specific learning objectives. In-depth research on differentiated geometric constraints may lead to new methods that better balance plasticity and stability [6].
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]