Core Viewpoint - The article discusses the revolutionary advancements in video generation through the introduction of the Frame-aware Video Diffusion Model (FVDM) and its practical application in the Pusa project, which significantly reduces training costs and enhances video generation capabilities [2][3][37]. Group 1: FVDM and Pusa Project - FVDM introduces a vectorized timestep variable (VTV) that allows each frame to have an independent temporal evolution path, addressing the limitations of traditional scalar timesteps in video generation [2][18]. - The Pusa project, developed in collaboration with Huawei's Hong Kong Research Institute, serves as a direct application and validation of FVDM, exploring a low-cost method for fine-tuning large-scale pre-trained video models [3][37]. - Pusa achieves superior results compared to the official Wan I2V model while reducing training costs by over 200 times (from at least $100,000 to $500) and data requirements by over 2500 times [5][37]. Group 2: Technical Innovations - The Pusa project utilizes non-destructive fine-tuning on pre-trained models like Wan-T2V 14B, allowing for effective video generation without compromising the original model's capabilities [5][29]. - The introduction of a probabilistic timestep sampling training strategy (PTSS) in FVDM enhances convergence speed and improves performance compared to the original model [30][31]. - Pusa's VTV mechanism enables diverse video generation tasks by allowing different frames to have distinct noise perturbation controls, thus facilitating more nuanced video generation [35][36]. Group 3: Community Engagement and Future Prospects - The complete codebase, training datasets, and training code for Pusa have been open-sourced to encourage community contributions and collaboration, aiming to enhance performance and explore new possibilities in video generation [17][37]. - The article emphasizes the potential of Pusa to lead the video generation field into a new era characterized by low costs and high flexibility [36][37].
数据减少超千倍,500 美金就可训练一流视频模型,港城、华为Pusa来了
机器之心·2025-06-19 02:28