Workflow
AAAI 2025 Oral | 火山引擎多媒体实验室提出VQ-Insight,AIGC视频画质理解大模型
机器之心·2025-11-20 15:13

Core Insights - The article discusses the advancements made by ByteDance's Volcano Engine Multimedia Lab in the field of multimedia technology, particularly focusing on the VQ-Insight model for AI-generated video quality assessment [2][4][19] - VQ-Insight has been recognized at the AAAI 2026 conference, highlighting its significance in the artificial intelligence research community [2] Research and Development - The Volcano Engine Multimedia Lab collaborates with Peking University and has produced a paper titled "VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning," which was selected for oral presentation at AAAI 2026 [2][4] - The lab has achieved multiple accolades in international technical competitions and has published numerous papers in top-tier journals [2] Methodology - VQ-Insight employs a progressive visual quality reinforcement learning framework, which includes phases for image scoring, task-driven temporal learning, and joint fine-tuning with video generation models [6][19] - The model aims to enhance the understanding of video quality by focusing on temporal coherence and multi-dimensional quality assessment, addressing challenges in AI-generated content evaluation [4][6] Performance Metrics - VQ-Insight has demonstrated superior performance in various tasks, including AIGC video preference comparison and multi-dimensional scoring, outperforming state-of-the-art methods in multiple datasets [10][12][19] - In the AIGC preference comparison task, VQ-Insight achieved a performance score of 50.80 in VOAScore and 75.71 in VideoReward, indicating its effectiveness in evaluating video quality [11] Application and Impact - The model's capabilities can be directly applied to optimize video generation models, enhancing the quality of generated content by providing accurate preference data for training [17][19] - VQ-Insight serves as a plug-and-play reward and preference module for video generation training, contributing to the development of next-generation AIGC video technologies [19]