Workflow
多模态视频生成
icon
Search documents
阿里开源视频生成模型Wan2.2-S2V
人民财讯8月26日电,8月26日晚,阿里开源多模态视频生成模型通义万相Wan2.2-S2V,仅需一张静态 图片和一段音频,即可生成面部表情自然、口型一致的电影级数字人视频。该模型单次生成的视频时长 可达分钟级。 ...
多模态视频生成模型通义万相“Wan2.2-S2V”正式开源
Di Yi Cai Jing· 2025-08-26 13:57
据通义万相Wan公众号消息,全新多模态视频生成模型通义万相"Wan2.2-S2V"正式开源,仅需一张静态 图片和一段音频,即可生成面部表情自然、口型一致、肢体动作丝滑的电影级数字人视频。该模型单次 生成的视频时长可达分钟级,大幅提升数字人直播、影视制作、AI教育等行业的视频创作效率。目 前,模型已在通义万相官网上线。 (文章来源:第一财经) ...
腾讯混元推出全新多模态视频生成工具 现已开源并上线官网
Sou Hu Cai Jing· 2025-05-10 14:48
【太平洋科技快讯】5月9日,腾讯混元正式推出并开源一款全新的多模态定制化视频生成工具—— Hunyuan Custom,该工具基于混元视频生成大模型(Hunyuan Video)打造。 Hunyuan Custom 的核心优势在于其强大的多模态融合能力。它能够同时处理文本、图像、音频、视频 等多种输入形式,并将其转化为连贯、自然的视频内容。相比传统视频生成模型,Hunyuan Custom 在 生成质量和控制力方面都有着显著提升。 Hunyuan Custom 具备强大的扩展能力。在音频驱动模式下,用户可以上传人物图像并配上音频语音, 模型便可生成人物在任意场景中说话、唱歌或进行其他音视频同步表演的效果,广泛适用于数字人直 播、虚拟客服、教育演示等场景。在视频驱动模式下,Hunyuan Custom 支持将图片中的人物或物体自 然地替换或插入到任意视频片段中,进行创意植入或场景扩展,轻松实现视频重构与内容增强。 此外,Hunyuan Custom 提供了多种视频生成模式,包括单主体视频生成、多主体视频生成、单主体视 频配音以及视频局部编辑等。其中,单主体生成能力已经开源并在混元官网上线,用户可以在"模型广 场 ...
图像提供身份,文本定义一切!腾讯开源多模态视频定制工具HunyuanCustom
AI科技大本营· 2025-05-09 09:35
Core Viewpoint - The article discusses the launch of Tencent's HunyuanCustom, a new multi-modal video generation framework that emphasizes customization capabilities as a key measure of system practicality [1][10]. Group 1: Technology Overview - HunyuanCustom is built on the HunyuanVideo model and supports various input modalities including images, text, audio, and video, enabling high-quality and controllable video generation [1][5]. - The framework addresses the "face-changing" challenge in traditional video generation models by maintaining subject consistency through a combination of image ID enhancement and multi-modal control inputs [3][6]. Group 2: Performance Comparison - Tencent's team conducted comparative tests of HunyuanCustom against several mainstream video customization methods, evaluating metrics such as face consistency, video-text consistency, semantic similarity, temporal consistency, and overall video quality [8]. - HunyuanCustom achieved a face consistency score of 0.627, outperforming other models, and also scored 0.593 in semantic similarity, indicating its leading position among current open-source solutions [9]. Group 3: System Architecture - The architecture of HunyuanCustom includes several key modules designed for decoupled control of image, voice, and video modalities, providing flexible interfaces for multi-modal generation [6][11]. - The data construction process incorporates models like Qwen, YOLO, and InsightFace to build a comprehensive labeling system covering various subject types, enhancing the model's generalization and editing flexibility [11]. Group 4: User Experience - The single subject generation capability of HunyuanCustom is currently available on the official website, with additional features set to be released throughout May [10]. - Users can access the experience through the provided links to the project website and code repository [12].
腾讯混元发布并开源视频生成工具HunyuanCustom,支持主体一致性生成
news flash· 2025-05-09 04:22
5月9日,腾讯混元团队发布并开源全新的多模态定制化视频生成工具HunyuanCustom。该模型基于混元 视频生成大模型(HunyuanVideo)打造,在主体一致性效果超过现有的开源方案,并可媲美顶尖闭源模 型。HunyuanCustom融合了文本、图像、音频、视频等多模态输入生视频的能力,是一款具备高度控制 力和生成质量的智能视频创作工具。(36氪) ...