Core Viewpoint - JarvisIR represents a significant advancement in image restoration technology, utilizing a Visual Language Model (VLM) as a controller to coordinate multiple expert models for robust image recovery under various weather conditions [5][51]. Group 1: Background and Motivation - The research addresses challenges in visual perception systems affected by adverse weather conditions, proposing JarvisIR as a solution to enhance image recovery capabilities [5]. - Traditional methods struggle with complex real-world scenarios, necessitating a more versatile approach [5]. Group 2: Methodology Overview - JarvisIR architecture employs a VLM to autonomously plan task sequences and select appropriate expert models for image restoration [9]. - The CleanBench dataset, comprising 150K synthetic and 80K real-world images, is developed to support training and evaluation [12][15]. - The MRRHF alignment algorithm combines supervised fine-tuning and human feedback to improve model generalization and decision stability [9][27]. Group 3: Training Framework - The training process consists of two phases: supervised fine-tuning (SFT) using synthetic data and MRRHF for real-world data alignment [23][27]. - MRRHF employs a reward modeling approach to assess image quality and guide VLM optimization [28]. Group 4: Experimental Results - JarvisIR-MRRHF demonstrates superior decision-making capabilities compared to other strategies, achieving a score of 6.21 on the CleanBench-Real validation set [43]. - In image restoration performance, JarvisIR-MRRHF outperforms existing methods across various weather conditions, with an average improvement of 50% in perceptual metrics [47]. Group 5: Technical Highlights - The integration of VLM as a control center marks a novel application in image restoration, enhancing contextual understanding and task planning [52]. - The collaborative mechanism of expert models allows for tailored responses to different weather-induced image degradations [52]. - The release of the CleanBench dataset fills a critical gap in real-world image restoration data, promoting further research and development in the field [52].
CVPR'25 | 感知性能飙升50%!JarvisIR:VLM掌舵, 不惧恶劣天气
具身智能之心·2025-06-21 12:06