免训练去噪算法 - filings, earnings calls, financial reports, news

免训练去噪算法

Search documents

NeurIPS 2025 Spotlight | 香港大学提出无需数据标记的ViT密集表征增强方法

机器之心· 2025-11-19 04:07

Core Insights - The article discusses the introduction of PH-Reg, a novel method for enhancing Vision Transformers (ViTs) by removing artifacts from dense features without requiring data labeling, thus improving model performance in fine-grained tasks [2][6][19]. Group 1: Methodology - PH-Reg employs a test-time augmentation denoising strategy to eliminate artifacts from the dense features of teacher models, resulting in a student model that outputs artifact-free dense features [2][11]. - The self-distillation framework of PH-Reg allows for enhancement of the student model architecture with minimal intrusion, focusing updates on specific components while preserving the core information of the pre-trained ViT model [11][20]. - The method is designed to be plug-and-play, requiring no retraining and enabling efficient artifact removal from existing pre-trained models like CLIP and DINOv2 [19][22]. Group 2: Experimental Results - In semantic segmentation tasks across eight benchmark datasets, PH-Reg outperformed mainstream methods such as MaskCLIP and SCLIP in seven datasets, demonstrating its robustness and effectiveness [13][21]. - Specifically, the method achieved a significant improvement of 5.04% in mean Intersection over Union (mIoU) on the VOC21 dataset and 3.64% on the ADE20K dataset for the CLIP model [21]. - The training time for PH-Reg is reduced by over 58.9% compared to traditional methods, with a total training time of 9000 minutes, significantly less than the 21908 minutes required for DVT [17][22]. Group 3: Advantages - PH-Reg's core advantage lies in its independence from gradient-based neural field learning, allowing for a single-stage distillation process that minimizes storage requirements and computational resources [22]. - The method can compute all distillation targets in real-time without the need for additional storage space, contrasting with DVT's requirement of 1.4 TB for neural field feature data [22].

Vision Transformers（ViTs）

自蒸馏框架

免训练去噪算法

PH-Reg（Post Hoc Registers）

CLIP

DINOv2

Vision Transformers（ViTs）

自蒸馏框架

免训练去噪算法

PH-Reg（Post Hoc Registers）

CLIP

DINOv2