苹果掀桌,扔掉AlphaFold核心模块,开启蛋白折叠「生成式AI」时代
3 6 Ke·2025-09-27 23:59

Core Insights - SimpleFold is a novel protein folding model that utilizes a general Transformer architecture, differing from traditional models like AlphaFold2 by not relying on complex, specialized components such as triangular updates or multiple sequence alignments (MSA) [3][4][10] Model Architecture - The SimpleFold architecture consists of three main components: a lightweight atom encoder, a heavy residue backbone, and a lightweight atom decoder, which collectively balance speed and accuracy [8][10] - The model employs flow matching to treat the generation process as a time-evolving journey, integrating ordinary differential equations (ODE) to refine the output structure progressively [6][10] Training and Evaluation - SimpleFold was trained on various scales, including models with parameters ranging from 100 million to 3 billion, with performance improvements observed as model size increased [11][24] - The training strategy involved replicating the same protein across multiple GPUs to enhance gradient stability and model performance [12][13] - Performance evaluations were conducted on widely recognized benchmarks, CAMEO22 and CASP14, demonstrating SimpleFold's competitive accuracy compared to leading models [14][19][21] Performance Metrics - In CAMEO22, SimpleFold achieved TM-scores and GDT-TS scores comparable to state-of-the-art models, with the 3 billion parameter model reaching a TM-score of 0.837 [15][19] - SimpleFold consistently outperformed other flow-matching methods, such as ESMFlow, across various metrics, indicating its robustness and generalization capabilities [18][22][31] Structural Generation Capability - SimpleFold's generative approach allows it to model structural distributions, producing not only a single deterministic structure but also multiple conformations for the same amino acid sequence [28] - The model's performance in generating structural ensembles was validated against the ATLAS dataset, showcasing its ability to capture diverse protein conformations effectively [29][31] Scalability and Data Utilization - The scalability of SimpleFold was confirmed through experiments showing that larger models performed better with increased training resources and data [34][35] - The model benefits from a growing dataset, with performance improvements noted as the number of unique structures in the training data increased [35]