开源视频生成

Search documents
11B模型拿下开源视频生成新SOTA!仅用224张GPU训练,训练成本省10倍
量子位· 2025-03-13 03:28
Core Viewpoint - Open-Sora 2.0 has been officially released, showcasing significant advancements in video generation technology with a focus on cost efficiency and high performance, rivaling leading closed-source models [1][10][12]. Cost Efficiency - The training cost for Open-Sora 2.0 is reduced to $200,000, significantly lower than the millions typically required for similar closed-source models [2][3]. - Open-Sora 2.0 achieves a cost reduction of 5-10 times compared to other open-source video models with over 10 billion parameters [13]. Performance Metrics - Open-Sora 2.0 features an 11 billion parameter scale, achieving performance levels comparable to high-cost models like HunyuanVideo and Step-Video [10]. - The performance gap between Open-Sora 2.0 and the leading closed-source model from OpenAI has narrowed from 4.52% to just 0.69% [12]. - In VBench evaluations, Open-Sora 2.0 surpassed Tencent's HunyuanVideo, establishing a new benchmark for open-source video generation technology [12]. Technical Innovations - The model architecture includes a 3D autoencoder and Flow Matching training framework, enhancing video generation quality [15]. - Open-Sora 2.0 employs a high-compression video autoencoder, reducing inference time significantly from nearly 30 minutes to under 3 minutes for generating 768px, 5-second videos [21]. - The training process incorporates advanced techniques such as strict data filtering, multi-stage screening, and efficient parallel training to optimize resource utilization [16][19]. Community Engagement - Open-Sora 2.0 is fully open-sourced, including model weights, inference code, and the entire distributed training process, inviting developers to participate [4][14]. - The project has gained substantial academic recognition, with nearly 100 citations in six months, solidifying its position as a leader in the open-source video generation space [14]. Future Directions - The focus on high-compression video autoencoders is seen as a key direction for reducing video generation costs in the future, with initial experiments showing promising results [25].