Workflow
分布外泛化
icon
Search documents
像挖币一样挖激活函数?DeepMind搭建「算力矿场」,暴力搜出下一代ReLU
机器之心· 2026-02-07 04:09
Core Insights - The article discusses the evolution of activation functions in neural networks, highlighting the transition from traditional functions like Sigmoid and ReLU to newer ones like GELU and Swish, emphasizing the impact on model performance [1][2]. Group 1: DeepMind's Innovation - Google DeepMind is revolutionizing the search for activation functions through a new method called AlphaEvolve, which explores an infinite space of Python functions rather than relying on predefined search spaces [2][4]. - The research paper titled "Finding Generalizable Activation Functions" showcases how DeepMind's approach led to the discovery of new activation functions, including GELUSine and GELU-Sinc-Perturbation, which outperform traditional functions in certain tasks [4][30]. Group 2: Methodology - AlphaEvolve utilizes a large language model (LLM) to generate and modify code, allowing for a more flexible and expansive search for activation functions [8][11]. - The process involves a "micro-laboratory" strategy, where synthetic data is used to optimize for out-of-distribution (OOD) generalization capabilities, avoiding the high costs of searching on large datasets like ImageNet [14][18]. Group 3: Performance of New Functions - The newly discovered functions demonstrated superior performance in algorithmic reasoning tasks, with GELU-Sinc-Perturbation achieving a score of 0.887 on the CLRS-30 benchmark, surpassing ReLU and GELU [34]. - In visual tasks, GELUSine and GELU-Sinc-Perturbation maintained competitive accuracy on ImageNet, achieving approximately 74.5% Top-1 accuracy, comparable to GELU [34][35]. Group 4: Insights on Function Design - The research indicates that the best-performing functions often follow a general formula combining a standard activation function with a periodic term, suggesting that incorporating periodic structures can enhance model generalization [25][35]. - The study highlights the importance of understanding the inductive biases introduced by activation functions, suggesting that periodic elements can help capture complex data structures beyond linear relationships [40][42].
AAAI 2026重磅!原力无限攻克具身智能“泛化”顽疾,定义因果AI新范式
具身智能之心· 2025-12-23 00:03
Core Insights - The article emphasizes the importance of "generalization" in robotics, which is crucial for AI to transition from laboratory settings to real-world applications [1] - Traditional AI struggles with generalization due to its reliance on superficial correlations rather than understanding underlying causality [2] Industry Pain Points - The primary challenge in embodied intelligence is "out-of-distribution" (OOD) generalization, which hinders robots from adapting to new environments [4] - An example illustrates that if an AI learns to perform a task in a specific context (e.g., a red table), it may fail when the context changes (e.g., a blue table) due to false correlations [5][7] Key Breakthroughs - The introduction of causal inference as a core technology aims to enhance AI's logical reasoning capabilities, allowing robots to "see through phenomena to essence" [9] - The DSAP framework constructs a structured causal graph, differentiating between state-invariant variables (noise) and state-dependent variables (core causality) [10] - By implementing a disentangled structure-aware proxy, the algorithm mathematically "cuts off" environmental noise from decision-making, teaching robots to focus on core factors [13] Validation and Results - The research team validated the DSAP algorithm in complex tasks such as Alchemy and robotic manipulation, demonstrating its effectiveness in new environmental configurations [16][18] - Results showed that agents using the DSAP algorithm exhibited remarkable stability and significantly higher success rates compared to existing state-of-the-art algorithms in OOD tests [19][21] - The introduction of causal mechanisms has enabled robots to develop preliminary logical reasoning abilities, moving beyond mere pixel-level pattern matching [22] Collaborative Efforts - The paper represents a successful collaboration between industry and academia, showcasing the integration of theoretical innovation and practical validation [24] - The partnership with top universities has allowed the company to maintain a leading position in academic research while accelerating the validation cycle of cutting-edge algorithms [25]
因子选股系列之一一五:DFQ-diversify:解决分布外泛化问题的自监督领域识别与对抗解耦模型
Orient Securities· 2025-05-07 07:45
- The DFQ-Diversify model effectively addresses the out-of-distribution generalization problem by introducing a self-supervised domain recognition and adversarial training mechanism, achieving explicit decoupling of label prediction and domain recognition tasks[2][3][10] - The model's training process includes three core modules: update_d, set_dlabel, and update, which work together through adversarial training to complete domain recognition and label prediction tasks, achieving explicit decoupling of the two[3][22][23] - The update_d module is responsible for domain recognition, using a GRU-based feature extractor, a domain bottleneck layer, a domain classifier, and a label adversarial discriminator to enhance domain representation accuracy and robustness[23][24][25] - The set_dlabel module updates the domain labels of samples through inference and clustering optimization, ensuring that the domain labels reflect the actual distribution of features in the feature space[28][29] - The update module focuses on label prediction, using a shared GRU feature extractor, a label bottleneck layer, a label classifier, and a domain adversarial discriminator to enhance label prediction accuracy and robustness[30][31][32] - The model employs a self-supervised dynamic domain partitioning mechanism, which helps the model autonomously identify potential domain information, enhancing its flexibility and generalization adaptability[34][36] - The DFQ-Diversify model constructs a three-level adversarial training mechanism, including inter-module task adversarial updates, intra-module dual loss adversarial balance, and gradient reversal layer mechanism, to achieve feature decoupling and robust transfer learning[42][43][47] - Compared to the Factorvae-pro model, the DFQ-Diversify model introduces self-supervised learning to dynamically identify potential domains, enhancing flexibility and generalization ability[50][53] - The DFQ-Diversify model shows superior performance in multiple stock pools, especially in large-cap stocks, with significant excess returns in the CSI All Share Index, CSI 300, and CSI 500 stock pools[5][6][107] - The model's backtesting results indicate that it achieved an IC of 12.22%, rankIC of 14.58%, and an annualized excess return of 32.52% in the CSI All Share Index stock pool from 2020 to 2025[5][107] - In the CSI 300 and CSI 500 enhanced strategies, the model achieved an IR of 1.89 and 1.67, and annualized excess returns of 11.27% and 12.19%, respectively[6][172][180]