Workflow
Computer Vision
icon
Search documents
ICML 2025 Spotlight | 用傅里叶分解探讨图像对抗扰动,代码已开源
机器之心· 2025-05-18 04:25
Core Viewpoint - The article discusses a novel approach to adversarial purification in computer vision, focusing on the frequency domain to effectively separate adversarial perturbations from clean images while preserving semantic information [5][21]. Research Background - Adversarial samples pose significant challenges to the safety and robustness of models in computer vision, necessitating effective adversarial purification techniques to restore original clean images [5]. - Existing adversarial purification methods are categorized into training-based and diffusion model-based approaches, with the latter offering stronger generalization capabilities without requiring extensive training data [5][6]. Motivation and Theoretical Analysis - The key to successful adversarial purification lies in eliminating adversarial perturbations while retaining the semantic information of the original image [9]. - Current strategies that add noise to mask adversarial perturbations often excessively damage the semantic content of the original image [9]. - The study employs Fourier decomposition to analyze the distribution characteristics of adversarial perturbations, revealing that they predominantly affect high-frequency components, while low-frequency components are more robust [9][12]. Methodology - A filter is constructed to retain low-frequency amplitude spectrum components, which are less affected by adversarial perturbations, while allowing for the replacement of these components with those from the original clean image [14][15]. - The phase spectrum is also addressed, as it is influenced by adversarial perturbations across all frequency components; thus, a projection method is used to maintain the integrity of the phase information [16][17]. Experimental Results - The proposed method demonstrates improved performance in both standard and robust accuracy metrics compared to state-of-the-art (SOTA) methods on datasets such as CIFAR10 and ImageNet [18][19]. - Visualizations indicate that the purified images closely resemble the original clean images, confirming the effectiveness of the proposed approach [20]. Conclusion - While significant progress has been made in preserving semantic information and removing adversarial perturbations, further exploration into more effective image decomposition methods and deeper theoretical explanations remains a future research direction [21].
CVPR 2025 Oral | DiffFNO:傅里叶神经算子助力扩散,开启任意尺度超分辨率新篇章
机器之心· 2025-05-04 04:57
Core Viewpoint - The article discusses the development of DiffFNO, a novel method that enhances diffusion models with neural operators to achieve high-quality and efficient super-resolution (SR) for images at any continuous scaling factor, addressing the challenges of traditional models [2][4]. Group 1: Methodology Overview - DiffFNO consists of three main components: Weighted Fourier Neural Operator (WFNO), Gated Fusion Mechanism, and Adaptive ODE Solver, which collectively improve the quality and efficiency of image reconstruction [2][5]. - The WFNO captures global information through frequency domain convolution and amplifies high-frequency components using learnable frequency weights, resulting in a PSNR improvement of approximately 0.3–0.5 dB in high-magnification tasks [10]. - The Gated Fusion Mechanism integrates a lightweight attention operator (AttnNO) to capture local spatial features, allowing for a flexible combination of spectral and spatial information [12][13]. Group 2: Adaptive ODE Solver - The Adaptive ODE Solver transforms the diffusion model's reverse process from a stochastic SDE to a deterministic ODE, significantly reducing the number of steps required for denoising from over a thousand to about thirty, thus enhancing inference speed [15]. - This method maintains image quality while halving the inference time from 266 ms to approximately 141 ms, even performing better at larger scaling factors [15]. Group 3: Experimental Validation - DiffFNO outperforms various state-of-the-art (SOTA) methods by 2–4 dB in PSNR across multiple benchmark datasets, particularly excelling in high magnification scenarios such as ×8 and ×12 [17][20]. - The method retains the complete Fourier spectrum, balancing overall image structure and local detail, and employs learnable frequency weights to dynamically adjust the influence of different frequency bands [18]. Group 4: Conclusion - The introduction of DiffFNO provides a new approach to reconcile the trade-off between high precision and low computational cost in super-resolution tasks, making it suitable for fields requiring high image quality, such as medical imaging, exploration, and gaming [22].