digihuman-腾讯混元推出AI数字人技术：一张照片配音频即可生成唱歌视频

Core Insights - Tencent's Hunyuan team and Tencent Music's Tianqin Lab have launched and open-sourced the HunyuanVideo-Avatar voice digital human model, which can generate dynamic video content from a single image and audio file [1] - The model integrates Tencent's Hunyuan video large model with MuseV technology, enabling it to recognize environmental information and emotional content from the input image and audio [1] - HunyuanVideo-Avatar supports various scene settings and artistic styles, breaking the limitations of traditional digital human technology by offering full-body motion capabilities [1] Application and Implementation - HunyuanVideo-Avatar has been implemented in several core products of Tencent Music Entertainment Group, including real-time singing actions in QQ Music and storytelling capabilities in Kugou Music [2] - The model employs a multi-modal diffusion Transformer (MM-DiT) architecture, ensuring consistency in character representation and extracting emotional features from audio and images [2] - The technology has achieved industry-leading levels in subject consistency and audio-visual synchronization, matching the performance of mainstream closed-source solutions [2] User Experience and Future Developments - The single-subject functionality of HunyuanVideo-Avatar is available for user experience on Tencent's official website, currently supporting audio files of up to 14 seconds [3] - The open-source initiative is expected to promote the widespread application of AI video generation technology, providing cost-effective solutions for short video creation, e-commerce marketing, and advertising [3]