OmniAvatar - filings, earnings calls, financial reports, news

All-In-One AI Solutions

Agent Platform

OceanDoc

All-In-One AI Solutions

Agent Platform

OceanDoc

Virtual Humans Everywhere: iFLYTEK Brings AI Service into Real-World Scenarios at MWC26

Globenewswire· 2026-03-05 15:58

Core Insights - iFLYTEK showcased a comprehensive lineup of virtual human technologies at MWC26, generating significant interest and demonstrating capabilities in real-world applications [1][12] Group 1: GuideX and Service Integration - GuideX is iFLYTEK's intelligent virtual human solution designed for high-traffic public environments, managing the full passenger service flow in settings like airports [3][4] - The system integrates multiple functions such as greeting, answering questions, check-in assistance, and gate guidance into a single interface, enhancing operational efficiency [4] - GuideX supports multimodal interaction, including voice, touch, gesture, and visual recognition, functioning as an intelligent service hub [5] Group 2: Mobile Digital Human and Dynamic Services - iFLYTEK introduced the Mobile Digital Human, which combines multimodal interaction with autonomous navigation, suitable for dynamic environments like exhibition halls and museums [7] - This system extends virtual human services beyond stationary touchpoints, providing contextual explanations in real time as it moves alongside visitors [7] Group 3: OmniAvatar and Personalization - OmniAvatar is a virtual human creation platform that allows for rapid cloning of voice and appearance, enabling customized service avatars [8] - In collaboration with the China Disabled Persons' Federation, it assists individuals in creating personalized avatars and synthetic voices, as well as digital twins for media professionals [9] Group 4: Embodied AI and Real-World Presence - iFLYTEK Guide01 is an embodied AI service robot that showcases lively demonstrations, providing a tangible physical presence in real-world environments [10] - The integration of flexible mobility and AI perception capabilities enhances the interaction between humans and AI [10] Group 5: Strategic Vision - iFLYTEK aims to integrate its virtual human technologies into real service scenarios across various industries, promoting efficient service delivery and natural human-AI interaction [12]

IFLYTEK(SZ:002230)

Virtual Human Technologies

Virtual Human Technologies

夸克、浙大开源OmniAvatar，一张图+一段音，就能生成长视频

机器之心· 2025-07-25 04:29

Core Insights - OmniAvatar is an innovative audio-driven full-body video generation model that requires only an image and an audio input to create corresponding videos, significantly enhancing lip-sync details and fluidity of full-body movements [1][6] - The model allows for precise control over character poses, emotions, and scenes through prompt words, showcasing its versatility in various applications [1][10] Performance Metrics - Experimental results indicate that OmniAvatar outperforms existing methods in lip-sync accuracy, facial and upper-body video generation, and text control, achieving a balance among video quality, accuracy, and aesthetics [3] - In comparison to other models, OmniAvatar achieved a FID score of 67.6 and a FVD score of 664, indicating superior performance in video generation tasks [5] Technical Innovations - OmniAvatar is based on the Wan2.1-T2V-14B model and utilizes LoRA for fine-tuning, effectively integrating audio features while maintaining the model's strong video generation capabilities [8] - The model employs a pixel-level audio embedding strategy that allows audio features to be integrated directly into the model's latent space, ensuring natural lip movements and coordinated body actions [13] Long Video Generation - The model has been optimized for long video generation, ensuring character consistency and temporal coherence through reference frame embedding and overlapping frame strategies [6][19] - By using a reference frame as a fixed guide for character identity and a latent overlapping strategy for seamless video continuity, OmniAvatar effectively anchors character identity across long video sequences [20] Future Directions - OmniAvatar represents an initial attempt in multi-modal video generation, with preliminary validation on experimental datasets, but it has not yet reached product-level application [22] - Future developments will focus on enhancing complex instruction processing capabilities and multi-character interactions to expand the model's applicability in more scenarios [22]

夸克AI实验室与浙大联合开源OmniAvatar：音频驱动全身视频生成新突破

Guan Cha Zhe Wang· 2025-07-25 04:16

Core Insights - Quark AI Technology Team has partnered with Zhejiang University to open-source OmniAvatar, an innovative audio-driven full-body video generation model that promises revolutionary changes in the video generation field [1] Group 1: Technology Advancements - OmniAvatar overcomes traditional limitations by enabling full-body motion driven by audio, rather than just facial movements, allowing for precise control [1] - The model generates videos by inputting a single image and an audio clip, significantly enhancing lip-sync details and the fluidity of full-body movements [1] - OmniAvatar incorporates a pixel-based audio embedding strategy, allowing audio features to be integrated at a pixel level within the model's latent space, resulting in more natural body movements [2] Group 2: Challenges and Solutions - Long video generation has been a challenge in audio-driven video creation; OmniAvatar addresses this with image embedding strategies and frame overlap techniques to ensure video coherence and consistent character identity [1] - A balance fine-tuning strategy based on LoRA has been proposed to efficiently adapt the model without altering its underlying capacity, allowing it to learn audio features while maintaining video quality and detail [2] Group 3: Future Directions - OmniAvatar represents an initial attempt in multi-modal video generation, having shown preliminary validation on experimental datasets but not yet reaching product-level application [2] - Future explorations will focus on enhancing complex instruction processing capabilities and multi-character interactions to broaden the model's applicability in various scenarios [2]