Workflow
中心法则
icon
Search documents
生物学的DeepSeek:阿里云发布LucaOne模型,首次统一DNA/RNA和蛋白质语言,能够理解中心法则
生物世界· 2025-06-19 09:44
Core Viewpoint - The article discusses the development of LucaOne, a generalized biological foundation model that can simultaneously understand and process nucleic acids (DNA and RNA) and protein sequences, marking a significant advancement in the field of life sciences [4][26]. Group 1: Introduction to LucaOne - LucaOne is the world's first foundational model capable of unifying the understanding of nucleic acids and protein sequences, likened to a "DeepSeek" for life sciences [4]. - The model was pre-trained on sequences from 169,861 species, showcasing its ability to comprehend key biological principles such as the translation of DNA into proteins [4][16]. Group 2: Technical Aspects of LucaOne - The model utilizes a vocabulary of 39 "characters" to encode nucleotides and amino acids, allowing it to read both nucleic acids and proteins [13]. - It employs semi-supervised learning, integrating known biological annotations to enhance its understanding [14]. - LucaOne has 1.8 billion parameters and has been trained on 36.95 billion biological sequence "words," enabling it to extract deep, universal patterns from nucleic acid and protein sequences [16]. Group 3: Performance and Capabilities - LucaOne demonstrated an impressive ability to understand the central dogma of molecular biology without explicit instruction, outperforming specialized models in tasks involving DNA and protein sequence matching [18]. - The model excels in generating embeddings that accurately capture the biological significance of sequences, outperforming other models in clustering similar sequences [19]. - It has shown strong performance across seven challenging bioinformatics tasks, including species classification and protein stability prediction, often using simpler downstream networks compared to specialized models [20][24]. Group 4: Significance and Future Outlook - LucaOne provides a unified framework for understanding the two core molecular carriers of life, breaking down barriers between different molecular types [26]. - The model exemplifies the potential of foundational models in bioinformatics, allowing researchers to develop various biological computational tools efficiently [26]. - It paves the way for deeper and more automated analysis of complex biological systems, such as gene regulatory networks and disease mechanisms [26].