Workflow
Gengram
icon
Search documents
DeepSeek同款“外挂大脑”进军生命科学!中国团队发布Gengram,破解DNA天书
生物世界· 2026-01-31 06:00
Core Viewpoint - The article discusses the innovative "Gengram" module introduced by the Genos team, which enhances genomic models by integrating an external memory mechanism to improve efficiency and performance in genomic tasks [2][10]. Group 1: Gengram Module Introduction - The Gengram module aims to address the limitations of existing genomic models that only process DNA sequences at the single-base level, which is inefficient for understanding biological functions [8][10]. - By utilizing a pre-built hash dictionary of common short sequences, Gengram allows models to retrieve biological knowledge directly, reducing the need for extensive computation [10][12]. Group 2: Performance Improvements - Models equipped with Gengram have achieved state-of-the-art (SOTA) results, with a 16.1% increase in AUC for splice site recognition tasks [6][18]. - Gengram is a lightweight plugin, with only about 20 million parameters, significantly enhancing model capabilities without requiring extensive training data [18][21]. Group 3: Biological Insights - The design of Gengram allows AI to consider the three-dimensional structure of DNA while processing one-dimensional sequences, improving its understanding of biological interactions [14][15]. - The optimal performance of Gengram was observed with a window size of 21 base pairs, which corresponds to the spatial arrangement of DNA [13][14]. Group 4: Team and Collaboration - The Genos team combines expertise from BGI's life sciences research and computational capabilities from Zhejiang Lab, representing a strategic collaboration in the AI for Science domain [20][21]. - The success of Gengram highlights the potential of aligning AI with biological logic to advance the understanding of genomic data [21].
DeepSeek论文发表16天后,国内团队已经写出了模型的「生物字典」
机器之心· 2026-01-31 04:10
Core Insights - The article discusses the introduction of Gengram, a genomic module inspired by the Engram technology, which enhances the efficiency of genomic models by utilizing a memory lookup system instead of traditional methods [1][4]. Group 1: Gengram Technology Overview - Gengram employs a hash table to store common DNA sequences (k-mers) and allows models to reference this external memory, significantly reducing computational load [3][11]. - The module is lightweight, with approximately 20 million parameters, and integrates seamlessly into larger models, enhancing their performance without substantial additional computational costs [15][19]. Group 2: Performance Improvements - Models utilizing Gengram showed significant performance improvements in various tasks, including a 16.1% increase in AUC for splice site prediction and a 22.6% increase for epigenetic prediction tasks [17]. - Gengram's implementation allows models to achieve high performance with minimal training data, outperforming models that have been trained on significantly larger datasets [18]. Group 3: Mechanisms and Adaptability - Gengram features a dynamic gating mechanism that enables the model to decide when to reference the memory based on the context, optimizing resource usage [12][13]. - The module demonstrates excellent adaptability across different model architectures, improving training efficiency and balancing expert loads in mixture of experts (MoE) configurations [19][21]. Group 4: Scientific Insights and Innovations - Gengram's design allows it to infer biological principles, such as the physical structure of DNA, without prior knowledge, showcasing its potential for scientific discovery [22][25]. - The choice of a 21 base pair window size for local aggregation aligns with the physical properties of DNA, indicating a sophisticated understanding of biological structures [23][24]. Group 5: Team Background and Capabilities - The Genos Team, responsible for Gengram, is a collaboration between Zhejiang Lab and BGI-HangzhouAI, combining expertise in AI and life sciences [33][34]. - The Genos model, which serves as the foundation for Gengram, reportedly surpasses leading industry benchmarks, indicating a strong competitive position in genomic modeling [35].