Workflow
BioCLIP 2
icon
Search documents
AI一眼认出95万物种,还能分辨雄雌老幼,2亿生物图像炼成“生命视觉”大模型
量子位· 2025-06-29 05:34
Core Viewpoint - The BioCLIP 2 model, trained on 2 billion biological images, demonstrates superior species recognition performance and emergent biological understanding beyond species classification, achieving significant advancements in ecological alignment and intra-species differentiation [1][2][5]. Group 1: Model Development and Data Collection - The research team collected 214 million biological images from four major platforms, creating the TreeOfLife-200M dataset, which includes 952,000 different classification labels, making it the largest and most diverse biological image library to date [2][4]. - The model was scaled from ViT-B to ViT-L, increasing the parameter count to facilitate the emergence of new knowledge [4]. Group 2: Performance Metrics - BioCLIP 2 achieved an average accuracy of 55.6% in zero-shot species recognition, outperforming the second-best SigLIP model by 16.1% [5]. - In non-species visual tasks, BioCLIP 2 surpassed common visual models like SigLIP and DINOv2 in habitat recognition, biological attribute identification, new species discovery, and plant disease recognition [8]. Group 3: Emergent Properties - Two emergent properties were identified: 1. Ecological alignment among species with similar lifestyles and ecological significance clustered in feature space, with clearer boundaries as training scale increased [10][11]. 2. Intra-species differentiation, where differences among male, female, and juvenile forms of the same species are distributed orthogonally to inter-species differences, improving with larger training scales [12][14]. Group 4: Training Scale Impact - Experiments showed that increasing training data from 1M to 214M consistently improved performance in non-species visual tasks and enhanced the orthogonality of intra-species differentiation [15].