谷歌Alpha家族再登Nature封面,刷新基因组预测SOTA,精准定位远端致病突变
AlphabetAlphabet(US:GOOGL) 3 6 Ke·2026-01-29 08:24

Core Insights - Google DeepMind's new AI model, AlphaGenome, expands the predictive capabilities of AI into the complex realm of the human genome, marking a significant advancement in genomic research [1] Group 1: AlphaGenome's Capabilities - AlphaGenome can simultaneously predict 11 different gene regulatory processes, accurately capturing complex interactions within genes [3][9] - The model analyzes intricate gene splicing mechanisms, identifying how a single gene can produce multiple proteins and when errors in this process lead to diseases [4] - It has successfully reconstructed pathogenic mutations related to leukemia, predicting changes in regions up to 8000 base pairs away from the gene [6][19] Group 2: Performance Metrics - AlphaGenome has achieved state-of-the-art (SOTA) performance in various tests, surpassing existing models in the field of genomic prediction [8][12] - In 24 evaluations related to genomic trajectory prediction, it secured 22 SOTA results, demonstrating its precision in capturing the effects of small genetic variations [12] - The model's predictions have been validated through rigorous benchmark tests, showcasing its ability to outperform competitors like Borzoi and Enformer in multiple rounds [12] Group 3: Technological Framework - AlphaGenome employs a hybrid architecture combining CNN and Transformer technologies, allowing it to extract local DNA sequence features while capturing long-range dependencies [23][30] - The model's input window has been expanded to 1 million base pairs, enabling comprehensive coverage of interactions between remote enhancers and promoters [28] - A two-phase training strategy was implemented, including pre-training with strict cross-validation and a distillation strategy to enhance generalization and inference efficiency [30] Group 4: Applications and Implications - AlphaGenome's ability to predict molecular phenotypes from DNA sequences enhances the understanding of non-coding regions, addressing challenges in genome-wide association studies (GWAS) [17] - The model has successfully identified regulatory directions for 49% of GWAS-related variants, significantly exceeding traditional methods [17] - Its findings provide actionable insights into the biological functions of non-coding region variations, potentially leading to breakthroughs in disease understanding and treatment [23]