最新研究揭示视觉模型与人脑的对齐机制

Core Viewpoint - The article discusses the similarities between AI models, specifically the DINOv3 model, and the human brain in terms of visual processing, highlighting the factors that influence this brain-model similarity. Group 1: Model Characteristics - DINOv3 is a self-supervised visual Transformer model trained on 1.7 billion natural images [7] - The model's size, training data volume, and image type significantly affect its similarity to the human brain [3][4] - The largest and most trained DINOv3 model, which uses human-centric images, achieves the highest brain similarity scores [4] Group 2: Training and Representation - The emergence of brain-like representations in AI models follows a specific temporal order, aligning first with early sensory cortex representations before requiring more training data to process information like higher brain regions [6] - As training progresses, DINOv3's learned representations gradually align with those of the human brain [11] - The representation hierarchy learned by DINOv3 corresponds to the spatial and temporal hierarchies found in the brain [12] Group 3: Evaluation and Findings - The study evaluated DINOv3's similarity to human brain visual representations using fMRI and MEG, focusing on 15 representative regions of interest (ROIs) [10] - Larger models exhibit brain-like features more quickly during training, particularly in higher brain regions [17] - Models trained on human-centric images perform better in capturing brain signals compared to those trained on satellite or cellular images [20] Group 4: Cortical Characteristics - The study found a strong positive correlation between the half-rise time of representations in DINOv3 and various cortical characteristics, such as cortical expansion, thickness, dynamics, and myelination [21][22][23][25] - Areas of the cortex that develop more significantly show later emergence of corresponding representations in the AI model [21] - Thicker cortical regions and those with slower intrinsic dynamics also correspond to longer half-rise times in the model [22][23]