vConTACT3
Search documents
Nature Biotechnology | 病毒分类工具的代际飞跃:vConTACT3如何超越前代,重塑宏基因组分析标准?
Xin Lang Cai Jing· 2025-12-24 09:40
Core Insights - The article discusses the vast number of viruses on Earth, estimated at approximately 10^31 particles, highlighting the significant gap between this number and the limited knowledge currently available to humanity [1][20] - The development of vConTACT3 represents a major advancement in virus classification, utilizing machine learning to create a hierarchical framework that allows for precise categorization from genus to order, even across host boundaries [2][21] Virus Classification Challenges - Traditional virus classification methods rely heavily on manual curation by experts, which is insufficient given the exponential growth of sequencing capabilities and the vast amount of data generated in metagenomics [20] - Previous tools like vConTACT and vConTACT 2.0 were limited by their reliance on the ClusterONE algorithm, which was "flat" and could not provide insights beyond the genus level [5][23] vConTACT3 Innovations - vConTACT3 employs a hierarchical clustering framework that integrates gene-sharing network topology with adaptive distance optimization, allowing for a more nuanced classification of viruses [5][24] - Researchers conducted extensive parameter optimization experiments using approximately 20,000 known virus genomes, testing over 60 million parameter combinations to tailor the classification system to different virus and host domains [6][24] Classification Accuracy - vConTACT3 demonstrated a high accuracy rate, achieving over 95% consistency with the International Committee on Taxonomy of Viruses (ICTV) classification for 35,545 prokaryotic virus genomes [8][26] - The tool's accuracy extends to eukaryotic viruses as well, with 100% consistency at the realm level and high rates at the order and family levels [8][26] Handling Fragmented Data - The challenge of data fragmentation in metagenomics is addressed by vConTACT3, which effectively classifies over 90% of fragmented virus sequences tested in a simulation experiment [10][27] - The classification accuracy varies with fragment length, with longer fragments (over 10 kb) achieving a 96.3% classification rate at the genus or subfamily level [11][28] Discovering New Taxa - vConTACT3's ability to create new taxonomic units allows it to classify previously uncharacterized virus sequences, demonstrating its potential to expand the known virus taxonomy significantly [12][29] - In a study using the INPHARED database, vConTACT3 automatically generated numerous new classifications, including 3,113 genera and 192 families, which are being proposed for submission to the ICTV [12][30] Future of Virus Taxonomy - The emergence of vConTACT3 signifies a shift from manual classification methods to a more automated, scalable approach, essential for managing the influx of new viral data [17][35] - The findings suggest that the current 15-level classification system may be overly complex, with data supporting only four key levels: genus, subfamily, family, and order [33][34]