Workflow
投机采样技术
icon
Search documents
每秒生成超30帧视频,支持实时交互!自回归视频生成新框架刷新生成效率
量子位· 2025-06-12 01:37
Core Viewpoint - The article discusses the advancements in video generation technology through the introduction of the Next-Frame Diffusion (NFD) framework developed by a collaboration between Microsoft Research and Peking University, which significantly enhances both the quality and efficiency of video generation [1][2]. Group 1: Video Generation Efficiency - NFD achieves video generation at over 30 frames per second while maintaining high quality, utilizing NVIDIA A100 GPUs [1][4]. - The framework allows for frame-wise parallel sampling and inter-frame autoregressive generation, leading to a substantial increase in generation efficiency [2][18]. - Compared to previous models, NFD can generate videos in approximately 0.48 seconds per frame on the A100 GPU [4]. Group 2: Technical Innovations - NFD employs a unique modeling approach using frame-wise bidirectional attention and inter-frame causal attention mechanisms, which improves the modeling of temporal dependencies [21][25]. - The architecture includes a tokenizer for converting visual signals into tokens and a diffusion-based transformer model that reduces computational costs by 50% compared to traditional 3D full attention methods [26][25]. - The training process is based on Flow Matching, which simplifies the training of continuous time consistency models for video data [27][28]. Group 3: Performance Comparison - NFD outperforms previous autoregressive models in multiple metrics, achieving a Fréchet Video Distance (FVD) of 212 and a Peak Signal-to-Noise Ratio (PSNR) of 16.46, while running at 6.15 frames per second [35]. - The accelerated version, NFD+, achieves even higher performance with 42.46 FPS for the 130M model and 31.14 FPS for the 310M model, while maintaining competitive visual quality [36][37]. - NFD+ retains a PSNR of 16.83 and an FVD of 227, comparable to larger models like MineWorld [37]. Group 4: Future Implications - The advancements in video generation models, such as NFD, indicate a growing trend towards more flexible and efficient generation paradigms, which could lead to innovative applications in gaming and interactive media [15][35]. - The research highlights the potential for direct interaction between players and models in gaming environments, moving away from traditional game engines [3][15].