DINOv3

Search documents
腾讯研究院AI速递 20250818
腾讯研究院· 2025-08-17 16:01
生成式AI 一、 谷歌开源Gemma 3 270M,4个注意力头专为终端而生 1. 谷歌发布轻量级模型Gemma 3 270M,下载仅241MB,拥有2.7亿参数,其中嵌入参数1.7亿个,Transformer模 块1亿个; 2. 模型极致节能,在Pixel 9 Pro手机上25次对话仅消耗0.75%电量,INT4量化后可在资源受限设备上高效运行; 3. 在IFEval基准测试上超越Qwen 2.5同级模型,支持高效指令遵循,下载量已突破两亿次,专为特定任务微调设 计。 https://mp.weixin.qq.com/s/IH64apP7SmHVCwHKfTGOsQ 二、 Meta 正式开源了 DINOv3,通用SOTA 级视觉基础模型 1. Meta开源DINOv3视觉基础模型,采用自监督学习,首次全面超越弱监督模型,在多个密集预测任务中表现优于专 业解决方案; 2. 模型采用创新的Gram Anchoring策略和旋转位置编码(RoPE),参数规模扩展至70亿,训练数据扩展至17亿张图 像; 3. DINOv3商业许可开源,提供多种规模模型系列(含ViT-B、ViT-L等),并专门训练了卫星图像骨干网络 ...
Meta王炸DINOv3:视觉自监督新巅峰!7B模型狂揽多任务SOTA
自动驾驶之心· 2025-08-16 16:04
点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我 -> 领取大模型巨卷干货 现在, Meta AI Research 携最新力作 DINOv3 震撼登场。 >> 点击进入→ 大模型技术 交流群 本文只做学术分享,如有侵权,联系删文 写在前面 还记得当年AlexNet在ImageNet上的一鸣惊人吗? 它点燃了深度学习的浪潮,但背后是海量 人工标注 的心血——千万张图片,被逐一打上标签。自此,"数据饥渴"和"标注成本"如同两座大山,压在 计算机视觉发展的道路上。 研究者们一直在追寻一个梦想: 能否让模型像人类婴儿一样,仅通过"观察"世界就能学习强大的视觉理解能力,彻底摆脱对人工标注的依赖? 这 就是 自监督学习(SSL) 的终极目标。 这条路上星光熠熠: 然而,挑战依然存在: MAE(Masked Autoencoders) :如同BERT之于文本,让模型通过"猜"被遮盖的图像块来学习,展现了强大的潜力。 MoCo/SimCLR :通过对比不同视角下的同一图像,让模型理解"什么看起来应该相似"。 DINO系列 (特别是DINOv2) :带来了真正的突破!它不仅能学到优秀的 全局 图像特征(用于分类、 ...
小扎又开源了:7B实现自监督学习SOTA
量子位· 2025-08-16 02:00
Core Viewpoint - Meta has released a new open-source visual model, DINOv3, which demonstrates that self-supervised learning models can outperform weakly supervised learning models across a wide range of tasks [1][3]. Group 1: Model Overview - DINOv3 utilizes an unlabelled approach, expanding the dataset to 1.7 billion images and the model size to 7 billion parameters, effectively supporting applications where data labeling is scarce or costly [1][6]. - The model shows superior performance in scenarios lacking labels or across domains, achieving state-of-the-art (SOTA) results in the three core tasks of computer vision: classification, detection, and segmentation [3][22]. Group 2: Training Methodology - The training process of DINOv3 consists of two main phases, focusing on large-scale self-supervised training to learn high-quality visual representations [8]. - A new method called "Gram anchoring" is introduced to address the degradation of dense feature maps during training, significantly enhancing local feature quality without compromising global features [15][20]. Group 3: Performance Metrics - DINOv3 outperforms its predecessor DINOv2 in various benchmarks, such as achieving 55.9 in segmentation on ADE-20k and 90.4 in image classification on ImageNet ReaL [4]. - The model's training strategy includes RoPE-box jittering, enhancing robustness to variations in resolution, scale, and aspect ratio while maintaining training stability [13][14]. Group 4: Practical Applications - DINOv3 has demonstrated strong generalization capabilities, such as analyzing satellite imagery to detect tree loss and land use changes, providing significant support for global forest restoration and agricultural management [27][28]. - The model has achieved SOTA results in multiple remote sensing tasks, including semantic geospatial tasks and high-resolution semantic tasks [29]. Group 5: Future Implications - The DINO series represents Meta's ongoing exploration of self-supervised methods in the visual domain, marking significant progress in large-scale self-supervised training [30][38]. - DINOv3 is expected to accelerate the development of existing applications and unlock new scenarios across various industries, including healthcare, environmental monitoring, autonomous driving, retail, and manufacturing [39].
吞下17亿图片,Meta最强巨兽DINOv3开源,重新定义CV天花板
3 6 Ke· 2025-08-15 07:29
【导读】无需人工标注,吞下17亿张图片,Meta用自监督学习炼出「视觉全能王」!NASA已将它送上火星,医疗、卫星、自动驾驶领域集体沸腾。 | Task | DINO | DINOv2 | DINOv3 | SigLIP 2 | PE | | --- | --- | --- | --- | --- | --- | | Benchmark | ViT-B/8 | ViT-g/14 | ViT-7B/16 | VIT-g-OPT/16 | ViT-G/14 | | | 0.09B | 1.1B | 7B | 1.8B | 1.98 | | Segmentation ADE-20k | 31.8 | 49.5 | 55.9 | 42.7 | 38.9 | | Depth estimation NYU I | 0.537 | 0.372 | 0.309 | 0.494 | 0.436 | | Video tracking DAVIS | 68.7 | 76.6 | 83.3 | 62.9 | 49.8 | | Instance retrieval Met | 17.1 | 44.6 | 55.4 | 13.9 | 1 ...
Meta视觉基座DINOv3王者归来:自监督首次全面超越弱监督,商用开源
机器之心· 2025-08-15 03:29
Core Viewpoint - The article discusses the advancements in computer vision, particularly focusing on the development and capabilities of the DINO series of models, emphasizing the transition from supervised to self-supervised learning paradigms in AI [2][15][29]. Group 1: DINO Model Evolution - DINO, DINOv2, and DINOv3 represent significant milestones in self-supervised learning, with DINOv3 achieving state-of-the-art performance across various tasks without the need for labeled data [2][15][31]. - DINOv3 has expanded its training dataset to 1.7 billion images and model parameters to 7 billion, significantly enhancing its capabilities compared to its predecessors [9][31][36]. - The introduction of innovative techniques in DINOv3, such as Gram Anchoring and RoPE, has improved the model's ability to generate high-resolution dense features, addressing limitations seen in DINOv2 [18][24][28]. Group 2: Performance Metrics - DINOv3 outperforms previous models in multiple benchmarks, achieving a segmentation score of 55.9, depth estimation of 0.309, and video tracking accuracy of 83.3, showcasing its superior performance in dense prediction tasks [17][31]. - The model's performance in image classification tasks is also notable, with an accuracy of 90.4 on ImageNet ReaL, indicating its robustness across various applications [17][31]. Group 3: Practical Applications - DINOv3 is being utilized in real-world applications, such as analyzing satellite images for environmental monitoring and supporting climate finance processes, demonstrating its practical impact [39][40]. - The model's ability to operate effectively without fine-tuning makes it suitable for edge applications where multiple visual prediction tasks need to be executed simultaneously [34][36]. Group 4: Community Engagement and Accessibility - Meta has open-sourced DINOv3, providing a complete backbone network and evaluation heads for community use, facilitating further research and development [13][36]. - The model family includes various distilled versions to cater to different computational needs, ensuring accessibility for researchers and developers [36][37].