自蒸馏预训练

Search documents
首个多模态工业信号基座模型FISHER,权重已开源,来自清华&上交等
机器之心· 2025-07-24 03:19
Core Viewpoint - The article introduces the FISHER model, the first multi-modal industrial signal foundation model, developed by researchers from Tsinghua University, Shanghai Jiao Tong University, Beijing Huakong Zhijia Technology Co., Ltd., and North China Electric Power University, aimed at unifying the modeling of heterogeneous industrial signals [1][3][5]. Research Background - The increasing installation of sensors on industrial equipment has led to challenges in efficiently analyzing industrial signals due to their significant heterogeneity, summarized as the M5 problem: multi-modal, multi-sampling rate, multi-scale, multi-task, and few faults [3][4]. Research Motivation - Despite the apparent differences in industrial signals, their intrinsic features and semantic information are similar, suggesting that a single model can be used for unified modeling of heterogeneous industrial signals. The FISHER model leverages these similarities to enhance representation capabilities [5][7]. FISHER Model Introduction - FISHER is designed to handle any sampling rate of industrial signals by using sub-bands as modeling units, employing a building-block approach to represent entire signals. It utilizes Short-Time Fourier Transform (STFT) for signal input features, focusing on high-frequency components crucial for fault detection [9][10]. Model Architecture - The FISHER model consists of a ViT Encoder and a CNN Decoder, utilizing a "teacher-student" self-distillation pre-training method. The model processes 80% of the masked sub-bands and combines them with the unmasked portions for output [12][13]. Experimental Results - FISHER's three versions outperformed baseline models by at least 3.91%, 4.34%, and 5.03% on the RMIS benchmark, demonstrating strong generalization capabilities. In anomaly detection, FISHER performed slightly below BEATs, while in fault diagnosis, it significantly surpassed all baseline models [19][22]. Performance Analysis - The performance curve of FISHER models is consistently higher than that of baseline systems, indicating superior pre-training and scaling effectiveness. The article suggests that data cleaning will be crucial for scaling up the training of signal foundation models [22][23].