LongVie框架
Search documents
Sora没做到的,LongVie框架给解决了,超长视频生成SOTA
机器之心· 2025-08-20 09:47
Core Insights - The article discusses the rapid advancements in video generation technology, particularly focusing on the challenges of creating controllable long videos exceeding one minute in length [2][3]. Group 1: Challenges in Long Video Generation - Current controllable video generation models face significant issues when generating long videos, including temporal inconsistency and visual degradation [8]. - Temporal inconsistency manifests as disjointed details and flickering between frames, while visual degradation leads to color drift and reduced clarity over time [8]. Group 2: Solutions Proposed by LongVie Framework - LongVie addresses temporal inconsistency through two key strategies: 1. Control Signals Global Normalization, which standardizes control signals across the entire video segment rather than within individual segments, enhancing consistency during segment stitching [10]. 2. Unified Noise Initialization, where all segments share the same initial noise to align the generation distribution, minimizing appearance and detail drift between frames [11]. - To combat visual degradation, LongVie employs a multi-modal fine control approach, integrating dense control signals (like depth maps) with sparse control signals (like key points) and utilizing a degradation-aware training strategy [16]. Group 3: LongVie Framework Overview - The LongVie framework normalizes dense and sparse control signals globally and uses unified noise initialization for all segments, generating each segment sequentially while maintaining coherence [20]. - The framework has been tested against standard ControlNet and its variants, with the best-performing variant being adopted for its superior stability and effectiveness [22]. Group 4: LongVie Capabilities and Benchmarking - LongVie supports various long video generation tasks, including video editing, style transfer, and Mesh-to-Video generation [23]. - The introduction of LongVGenBench, the first standardized benchmark for controllable long video generation, includes 100 high-resolution videos longer than one minute, aimed at promoting systematic research and fair evaluation in this field [25]. - Quantitative metrics and user evaluations indicate that LongVie outperforms existing methods across multiple indicators, achieving state-of-the-art (SOTA) performance [28].