Core Insights - Alibaba has open-sourced the movie-level video generation model Wan2.2, which integrates three major cinematic aesthetic elements: light, color, and camera language, allowing users to combine over 60 intuitive and controllable parameters to significantly enhance video production efficiency [1] Group 1: Model Features - Wan2.2 can generate 5 seconds of high-definition video in a single instance, with users able to refine short film production through multiple prompts [1] - The model includes three versions: text-to-video (Wan2.2-T2V-A14B), image-to-video (Wan2.2-I2V-A14B), and unified video generation (Wan2.2-TI2V-5B), with a total parameter count of 27 billion and 14 billion active parameters [1] - The model employs a mixture of experts (MoE) architecture, which allows for a 50% reduction in computational resource consumption while improving performance in complex motion generation and aesthetic expression [1] Group 2: Additional Model Release - A smaller 5 billion parameter unified video generation model has also been released, supporting both text-to-video and image-to-video generation, deployable on consumer-grade graphics cards [2] - This model features a high compression rate 3D VAE architecture, achieving a time and space compression ratio of up to 4×16×16, with an information compression rate of 64, requiring only 22GB of video memory to generate 5 seconds of video in minutes [2] - Since February, the total downloads of various models from the Tongyi Wanshang series have exceeded 5 million, making it one of the most popular video generation models in the open-source community [2]
阿里开源通义万相Wan2.2,大幅提升电影级画面的制作效率