Core Viewpoint - Alibaba has released the world's first open-source MoE architecture video generation model, Wan2.2, which features cinematic aesthetic control capabilities [3][11]. Group 1: Model Features - Wan2.2 is the first video diffusion model to introduce the Mixture-of-Experts (MoE) architecture, allowing for enhanced model capacity without increasing computational costs [11][12]. - The training data for Wan2.2 has significantly increased, with image data up by 65.6% and video data up by 83.2% compared to Wan2.1, improving the model's generalization capabilities in motion expression, semantic understanding, and aesthetic performance [14][15]. - The model incorporates a specially curated aesthetic dataset with fine-grained attributes such as light and shadow, composition, and color, enabling precise control over cinematic styles and user-customizable aesthetic preferences [16]. Group 2: Technical Innovations - Wan2.2 features a high-efficiency Hybrid TI2V architecture, with a model size of 5 billion parameters and a compression rate of 16×16×4, supporting video generation at a resolution of 720P and 24fps [18]. - It is one of the fastest models on the market for generating 720P, 24fps videos, catering to both industrial and academic needs [19]. - Users can download and utilize the model from platforms like Hugging Face and Alibaba's ModelScope community [20].
阿里再开源,全球首个MoE视频生成模型登场,电影级美学效果一触即达