超高清视频数据集

Search documents
1080p飞升4k,浙大开源原生超高清视频生成方案,突破AI视频生成清晰度上限
量子位· 2025-07-01 03:51
Core Viewpoint - The introduction of the UltraVideo dataset, a high-quality open-source UHD-4K video dataset, addresses the limitations of existing video generation models that struggle with low resolution and simplistic captions, enabling a significant leap in video quality from "barely watchable" to "cinema-level" [1][2]. Group 1: Dataset Characteristics - UltraVideo includes over 100 themes, with each video accompanied by 9 structured captions and a summary caption averaging 824 words [2]. - The dataset is the first of its kind to offer open-source 4K/8K ultra-high-definition video, facilitating a major advancement in video generation quality [2]. - The dataset comprises 42,000 short videos (3-10 seconds) and 17,000 long videos (over 10 seconds), with 22.4% of the videos in 8K resolution [9]. Group 2: Methodology and Model Improvements - The UltraWan-4K model, fine-tuned on the UltraVideo dataset, achieves breakthroughs through a four-stage filtering process to ensure high-quality video generation [3][19]. - The model addresses two main bottlenecks in video generation: resolution traps and semantic gaps, allowing for better control over video parameters [4][5]. - The filtering process includes manual selection of high-quality source videos, statistical information filtering, and structured semantic descriptions to enhance video quality [6][7]. Group 3: Performance and Results - Experiments show that using the UltraVideo dataset significantly improves the aesthetic quality and resolution of generated videos, even with a small sample size [13]. - The UltraWan-4K model demonstrates better performance in image quality and temporal stability compared to previous models, although it has a lower frame rate [19]. - The results indicate that high-quality data can effectively break the resolution ceiling in video generation, paving the way for future advancements in UHD video tasks [21]. Group 4: Future Directions - The team plans to explore long video generation tasks using a long temporal subset of the dataset [22]. - UltraVideo and the UltraWan-1K/4K LoRA weights have been fully open-sourced, promoting further research and development in the field [22].