digihuman-阿里通义万相新突破：静态图+音频，轻松生成电影级数字人视频！

Core Viewpoint - Alibaba demonstrates its strong capabilities in artificial intelligence by launching the open-source multi-modal video generation model Wan2.2-S2V, which allows users to create high-quality digital human videos from a static image and audio input [1][3]. Group 1: Product Features - The Wan2.2-S2V model can generate videos with a duration of up to several minutes, significantly enhancing video creation efficiency in industries such as digital human live streaming, film post-production, and AI education [2][5]. - The model supports various video resolutions, accommodating both vertical short videos and horizontal films, and incorporates advanced control mechanisms like AdaIN and CrossAttention for improved audio synchronization [3][5]. - Users can upload an image and audio to generate dynamic videos where the subject can perform actions like speaking and singing, with facial expressions and lip movements closely synchronized to the audio [3][5]. Group 2: Industry Impact - Alibaba has been at the forefront of video generation technology, having previously released the Wan2.2 series models, which set new industry standards with their MoE architecture [3]. - The introduction of the Wan2.2-S2V model addresses the growing demand for efficient video creation tools in rapidly evolving sectors such as digital human live streaming and film production [5]. - The advancements in video generation technology are expected to lead to further innovations and breakthroughs in the field, driven by continuous improvements in the underlying models [5].