LongVGenBench
Search documents
「视频世界模型」新突破:AI连续生成5分钟,画面也不崩
机器之心· 2025-12-31 09:31
Core Insights - The article discusses the emergence of AI-generated videos and the challenges of creating videos that not only look realistic but also adhere to the laws of the physical world, which is the focus of the "Video World Model" [2] - The LongVie 2 framework is introduced as a solution to generate high-fidelity, controllable videos lasting up to 5 minutes, addressing the limitations of existing models [2][6] Group 1: Challenges in Current Video Models - Current video world models face a common issue where increasing generation length leads to a decline in controllability, visual fidelity, and temporal consistency [6] - The degradation of quality in long video generation is nearly unavoidable, with issues such as visual degradation and logical inconsistencies becoming significant bottlenecks [2][12] Group 2: LongVie 2 Framework - LongVie 2 employs a three-stage progressive training strategy to enhance controllability, stability, and temporal consistency [9][14] - Stage 1 focuses on Dense & Sparse multimodal control, utilizing dense signals (like depth maps) and sparse signals (like keypoint trajectories) to provide stable and interpretable world constraints [9] - Stage 2 introduces degradation-aware training, where the model learns to maintain stability in generation despite imperfect inputs, significantly improving long-term visual fidelity [13] - Stage 3 incorporates historical context modeling, explicitly integrating information from previous segments to ensure smoother transitions and reduce semantic breaks [14] Group 3: Performance Metrics - LongVie 2 demonstrates superior controllability compared to existing methods, achieving state-of-the-art (SOTA) levels in various metrics [21][29] - Ablation studies validate the effectiveness of the three-stage training approach, showing improvements in quality, controllability, and temporal consistency across multiple indicators [26] Group 4: LongVGenBench - The article introduces LongVGenBench, the first standardized benchmark dataset designed for controllable long video generation, containing 100 high-resolution videos over 1 minute in length [28] - This benchmark aims to facilitate systematic research and fair evaluation in the field of long video generation [28]
Sora没做到的,LongVie框架给解决了,超长视频生成SOTA
机器之心· 2025-08-20 09:47
Core Insights - The article discusses the rapid advancements in video generation technology, particularly focusing on the challenges of creating controllable long videos exceeding one minute in length [2][3]. Group 1: Challenges in Long Video Generation - Current controllable video generation models face significant issues when generating long videos, including temporal inconsistency and visual degradation [8]. - Temporal inconsistency manifests as disjointed details and flickering between frames, while visual degradation leads to color drift and reduced clarity over time [8]. Group 2: Solutions Proposed by LongVie Framework - LongVie addresses temporal inconsistency through two key strategies: 1. Control Signals Global Normalization, which standardizes control signals across the entire video segment rather than within individual segments, enhancing consistency during segment stitching [10]. 2. Unified Noise Initialization, where all segments share the same initial noise to align the generation distribution, minimizing appearance and detail drift between frames [11]. - To combat visual degradation, LongVie employs a multi-modal fine control approach, integrating dense control signals (like depth maps) with sparse control signals (like key points) and utilizing a degradation-aware training strategy [16]. Group 3: LongVie Framework Overview - The LongVie framework normalizes dense and sparse control signals globally and uses unified noise initialization for all segments, generating each segment sequentially while maintaining coherence [20]. - The framework has been tested against standard ControlNet and its variants, with the best-performing variant being adopted for its superior stability and effectiveness [22]. Group 4: LongVie Capabilities and Benchmarking - LongVie supports various long video generation tasks, including video editing, style transfer, and Mesh-to-Video generation [23]. - The introduction of LongVGenBench, the first standardized benchmark for controllable long video generation, includes 100 high-resolution videos longer than one minute, aimed at promoting systematic research and fair evaluation in this field [25]. - Quantitative metrics and user evaluations indicate that LongVie outperforms existing methods across multiple indicators, achieving state-of-the-art (SOTA) performance [28].