首个故事可视化综合评估框架来了！80个故事单元53种类别，20种技术方案全面对比

Core Viewpoint - The advancement of AIGC technology has led to increased interest in story visualization, which serves as a foundation for narrative generation in films [1][4]. Group 1: Story Visualization Technology - Story visualization aims to generate a series of continuous images from a piece of text or a photo [2]. - The core challenge of story visualization technology is to ensure character consistency while constructing detailed and complex narrative scenes and worldviews [4]. - Current breakthroughs in diffusion models and autoregressive generation techniques have significantly improved the visualization capabilities of long-form stories, yet existing evaluation systems remain inadequate due to their limited metrics and dimensions [4][5]. Group 2: ViStoryBench Evaluation Framework - The ViStoryBench framework has been proposed to establish a more scientific evaluation system for story visualization [6]. - This benchmark not only focuses on technical implementation but also emphasizes the organic unity of artistic expression and narrative logic, providing a reliable evaluation tool for industry development [8]. - The framework includes a comprehensive assessment system that addresses the diversity and multidimensionality of evaluation standards in the story visualization field [11]. Group 3: Dataset Creation - A diverse dataset has been meticulously constructed, containing both Chinese and English content, covering various story themes and artistic expressions [13]. - The dataset consists of 80 story units across 53 story categories, featuring 344 independent characters, balancing narrative frameworks and visual elements [14]. - The design includes scenarios with single protagonists and multiple character interactions, specifically testing the model's performance in maintaining character coherence [14]. Group 4: Evaluation Metrics - A multi-dimensional evaluation framework has been established, including character and style similarity analysis, fine-grained prompt alignment, aesthetic quality assessment, and copy-paste behavior detection [22]. - The system functions like a "fire-eye" inspector, accurately identifying characters in generated images and assessing their similarity to reference images [24]. - The evaluation of character similarity is conducted across two dimensions: cross-similarity and self-similarity [25][27]. Group 5: Experimental Design and Results - The team systematically evaluated over 20 technical solutions, including 18 main methods and their variants, covering open-source methods, commercial products, and multimodal large language models [33]. - The results highlight the necessity of comprehensive metrics, as single evaluation indicators exhibit significant limitations, particularly in the Copy-Paste Baseline's performance [55]. - The findings provide important references for optimizing story visualization technology, underscoring the need for a multi-dimensional evaluation system [56].