Core Insights - NVIDIA's research team, in collaboration with Mila, introduced La-Proteina, a novel atomic-level protein design method that effectively combines explicit backbone modeling with fixed-size latent representations for each residue, addressing the critical challenge of variable dimensionality in explicit side-chain representation [1][2][4] Research Highlights - La-Proteina employs a partially latent flow matching framework designed for the joint generation of protein sequences and complete atomic-level structures, achieving state-of-the-art (SOTA) performance in unconditional protein generation, capable of generating diverse, co-designed, and structurally valid proteins with up to 800 residues [4][5][13] - The model has been successfully applied to both indexed and non-indexed atomic-level motif scaffold design tasks, outperforming previous all-atom generators [5][16] Training Data and Methodology - The research utilized two datasets for training unconditional models: one derived from the AlphaFold Database (AFDB) with approximately 550,000 protein samples, and another custom subset focusing on longer sequences with over 4 million samples [7][8] - La-Proteina's architecture consists of an encoder, decoder, and denoiser, all based on a Transformer core, facilitating the mapping of input protein data to latent variables and the reconstruction of complete proteins [10][12] Experimental Results - In experiments, La-Proteina demonstrated superior performance in unconditional atomic-level protein generation and atomic motif scaffold design, significantly outperforming baseline methods in terms of design capability, diversity, and novelty [13][15] - The model's ability to generate large atomic structures was validated through training on a dataset containing approximately 46 million samples, excelling in generating proteins longer than 500 residues [15] Industry Implications - La-Proteina represents a significant advancement in the field of atomic-level protein design, attracting attention from both academia and industry, with various research teams and companies exploring innovative applications and improvements in protein generation models [17][18]
英伟达实现原子级蛋白质设计突破,高精度生成多达800个残基的蛋白质