多模态视频生成模型通义万相“Wan2.2-S2V”正式开源

Core Insights - The new multimodal video generation model "Wan2.2-S2V" has been officially open-sourced, allowing the creation of high-quality digital human videos from a single static image and an audio clip [2] - The model can generate videos with a duration of up to several minutes in a single instance, significantly enhancing video creation efficiency in industries such as digital human live streaming, film production, and AI education [2] - The model is now available on the official Tongyi Wanxiang website [2]