Workflow
蛋白大语言模型
icon
Search documents
基于自主生物制造数据的生物制造高产菌株改造示范场景
Xin Lang Cai Jing· 2026-01-13 11:33
Core Viewpoint - The establishment of a biological manufacturing scientific data sharing service platform aims to address issues such as fragmented biological resource data, inconsistent formats, insufficient high-quality scientific data supply, and the lack of standardized data support for artificial intelligence applications in the field of biological manufacturing [1][6]. Group 1: Overview of the Scene - The platform focuses on "standard guidance, data integration, intelligent driving, and industrial transformation," creating a comprehensive data service system covering biological resource supply, biological component prediction and generation, industrial enzyme design, and strain modification [1][6]. Group 2: Overall Approach - The integration of diverse biological data sources includes national strain resource libraries, provincial microbial preservation institutions, and international public databases, forming a foundational data resource pool that encompasses strains, genomes, enzymes, metabolic pathways, literature, and patents [3][8]. - The construction of high-quality, standardized datasets involves data quality control, format standardization, and metadata supplementation, resulting in 78 high-quality datasets directly usable for AI training, covering key elements such as industrial strain genomes, functional promoters, terminators, regulatory factors, and various industrial enzymes [3][8]. - A "platform + data + model" integrated support system is being developed to create a foundational platform for biological manufacturing research, enabling one-stop data retrieval and sharing, while also developing AI tools based on protein large language models to support intelligent mining and targeted design of biological components [3][8]. Group 3: Innovative Measures - The initiative leads the formulation of international ISO standards to standardize the description framework of synthetic biological components, enhancing China's voice in the international biological manufacturing field [4][9]. - A new type of information technology platform for biological manufacturing is being constructed, integrating physical, data, and research instrument resources to solidify the data foundation for industrial development [4][9]. - The development of an AI discovery paradigm driven by protein large language models aims to break through the limitations of traditional sequence alignment, enabling efficient mining of new global regulatory factors and functional components, forming a transferable and generalized intelligent design pathway [4][9]. Group 4: Main Achievements - In terms of social benefits, the development and application of AI models like GR-Discriminator have achieved breakthroughs in intelligent technology innovation paradigms, with the biological manufacturing foundational data resource platform officially operational, significantly strengthening industrial foundational support capabilities [5][10]. - Economically, the introduction of a one-step enzymatic synthesis technology for the chiral intermediate R-amide of Prilocaine, based on big data mining and AI design, has been implemented in multiple domestic enterprises, avoiding the use of toxic organic solvents and reducing synthesis costs by 50% compared to traditional chemical separation processes, with an expected output value exceeding 1 billion [5][10].