Workflow
UniPercept
icon
Search documents
深入感知级别图像理解:UniPercept 统一图像美学、质量与结构纹理感知
机器之心· 2026-01-08 02:06
Core Insights - The article discusses the development of UniPercept, a novel framework for perceptual image understanding that integrates aesthetics, quality, and structure & texture dimensions, addressing the limitations of existing multimodal large language models in understanding visual perception [3][5]. Group 1: Framework Overview - UniPercept is the first framework to unify three perceptual dimensions: aesthetics, quality, and structure & texture, enhancing the understanding of how images look beyond mere object recognition [3][5]. - The framework includes a hierarchical definition system and a large-scale benchmark dataset called UniPercept-Bench, which allows for comprehensive evaluation of image attributes [5][10]. Group 2: Evaluation System - UniPercept-Bench features a three-tiered evaluation system comprising 3 domains, 17 categories, and 44 criteria, providing detailed expert-level definitions that surpass previous image evaluation benchmarks [10][11]. - The evaluation dimensions include Image Aesthetics Assessment (IAA), Image Quality Assessment (IQA), and Image Structure & Texture Assessment (ISTA), each focusing on different aspects of image perception [11][12]. Group 3: Model Development - The model employs domain-adaptive pre-training using a dataset of approximately 800,000 samples, which helps it learn low-level visual features across domains [22]. - Task-aligned reinforcement learning is utilized to enhance the model's perceptual consistency, with specific reward functions designed for visual rating (VR) and visual question answering (VQA) tasks [23][25]. Group 4: Performance Metrics - UniPercept outperforms existing top models in various tasks, achieving the highest Spearman and Pearson correlation coefficients in aesthetics, quality, and structure assessments [29][30]. - In visual question answering tasks, UniPercept shows a significant accuracy improvement over leading models, particularly in identifying subtle damages in images [31]. Group 5: Applications - UniPercept demonstrates potential as a reward model for generative models, optimizing image generation by enhancing composition balance, detail sharpness, and structural richness [33][36]. - The framework's multi-dimensional reward signals work synergistically to improve both visual appeal and technical fidelity of generated images [37].