Workflow
音频驱动肖像动画
icon
Search documents
ICML 2025|趣丸研发新型人脸动画技术,声音+指令精准控制表情
机器之心· 2025-06-05 04:40
Core Viewpoint - The article discusses the innovative Playmate framework developed by Guangzhou Quwan Technology, which utilizes AI-driven technology to generate high-quality and controllable portrait animation videos based on audio input and images [1][3]. Group 1: Technology Overview - Playmate is a dual-stage training framework based on a 3D implicit space-guided diffusion model, designed to generate high-quality and controllable portrait animation videos [3]. - The framework decouples facial attributes such as expressions, lip movements, and head poses, allowing for precise control over the generated videos [3][12]. - Playmate has shown significant advancements in video quality, lip synchronization accuracy, and emotional control flexibility compared to existing methods [3][28]. Group 2: Methodology - The core idea of Playmate is to utilize a 3D implicit space to decouple facial attributes and achieve high-quality generation through a dual-stage training framework [13]. - The first stage involves constructing a motion decoupling module to separate expressions, lip movements, and head poses directly from audio [16]. - The second stage introduces an emotional control module that encodes emotional conditions into the latent space for fine emotional control over the generated videos [16][22]. Group 3: Performance Evaluation - Playmate has been evaluated using various datasets, including AVSpeech and CelebV-Text, and has demonstrated superior performance in metrics such as FID and FVD, indicating its generated videos are closer to real data distributions [28]. - In qualitative assessments, Playmate excels in generating realistic expressions and natural head movements across different styles of portraits, showcasing its versatility and robustness [28][31]. - The framework allows for the generation of dynamic videos reflecting different emotional states from the same audio segment, highlighting its advantages in emotional control [31]. Group 4: Future Prospects - Playmate significantly enhances the quality and flexibility of audio-driven portrait animation generation, providing strong technical support for fields such as film production, virtual reality, and interactive media [33]. - The potential for future expansion into full-body animation generation and the incorporation of more diverse training data is anticipated to improve its robustness and adaptability [33].