Workflow
Science:使用AI模型预测哪些启动子突变会改变基因表达

Core Viewpoint - The article discusses the development of PromoterAI, an AI model by Illumina researchers, which accurately predicts expression-altering mutations in non-coding promoter regions, highlighting its significance in understanding genetic mutations and their impact on human health and rare diseases [3][4][6]. Group 1: PromoterAI Development and Functionality - PromoterAI is a deep learning model designed to predict the effects of promoter mutations on gene expression by evaluating genomic sequences in promoter regions [6]. - The model was trained at single nucleotide resolution to predict histone modifications, DNA accessibility, transcription factor binding, and gene expression around transcription start sites [6][9]. - The research team constructed a training dataset containing thousands of rare promoter mutations associated with abnormal gene expression across various tissues, controlling for confounding variables [6][9]. Group 2: Research Findings and Implications - The study found that predicted expression-altering promoter mutations were significantly enriched in clinically relevant genes of rare disease patients, contributing to 6% of the genetic burden associated with rare diseases [4][9]. - Analysis of population allele frequency spectra showed a significant depletion of predicted harmful mutations, indicating natural selection's role in removing deleterious mutations [7]. - PromoterAI's predictions were strongly correlated with protein abundance and quantitative trait measurements, enhancing the understanding of genetic contributions to rare diseases [7][9]. Group 3: Clinical Applications and Future Directions - The model was applied to undiagnosed rare disease patients in the Genomics England cohort, revealing a specific enrichment of predicted mutations in the promoter regions of Mendelian disease genes [7][9]. - PromoterAI fills a critical gap in genomic interpretation by accurately detecting promoter mutations that affect gene expression, which is often overlooked in current clinical genomic analyses focused on coding region mutations [9].