Workflow
OmniAvatar
icon
Search documents
iFLYTEK Showcases All-In-One AI Solutions at MWC26, Bringing Private, Customizable AI to Industry
Globenewswire· 2026-03-05 16:25
Core Insights - iFLYTEK is showcasing its All-In-One AI Solutions at MWC26, which integrates hardware and software for private AI computing and model deployment, aimed at helping governments and enterprises build their own AI capabilities [1][2] Group 1: Product Features - The All-In-One AI Solutions provide a fully integrated system that optimizes and automates core business processes through AI, specifically designed for industries with strict security needs such as media, telecommunications, government, and finance [2] - The system operates on local computing and dual large-model engines, ensuring stable performance and compliance with high security standards by running fully offline [3] - The solution has demonstrated significant improvements in practical applications, such as an 85% increase in accuracy and a tripling of transcription efficiency for media teams, while also reducing costs and ensuring data security [4] Group 2: Agent Platform - iFLYTEK's Agent Platform allows organizations to quickly convert existing workflows into AI-powered applications, enabling teams to build on current processes rather than starting from scratch [7] - The platform features no-code and low-code tools, facilitating rapid development and deployment of AI applications without complex programming, and supports multimodal interaction for more natural communication [8] Group 3: Proven Applications - At MWC26, iFLYTEK is presenting over 30 curated Super Agents, with the platform ecosystem now encompassing more than 1.3 million intelligent agents, indicating significant real-world adoption [9] - Two flagship applications include OceanDoc, which generates structured professional reports in seconds and is used by over 8 million users globally, and OmniAvatar, which enables low-cost multilingual marketing video creation for organizations [9] Group 4: Strategic Vision - iFLYTEK emphasizes that AI creates value when it can be deployed, trusted, and used at scale, combining local AI infrastructure, intelligent agents, and proven applications to enhance enterprise productivity [11]
Virtual Humans Everywhere: iFLYTEK Brings AI Service into Real-World Scenarios at MWC26
Globenewswire· 2026-03-05 15:58
Core Insights - iFLYTEK showcased a comprehensive lineup of virtual human technologies at MWC26, generating significant interest and demonstrating capabilities in real-world applications [1][12] Group 1: GuideX and Service Integration - GuideX is iFLYTEK's intelligent virtual human solution designed for high-traffic public environments, managing the full passenger service flow in settings like airports [3][4] - The system integrates multiple functions such as greeting, answering questions, check-in assistance, and gate guidance into a single interface, enhancing operational efficiency [4] - GuideX supports multimodal interaction, including voice, touch, gesture, and visual recognition, functioning as an intelligent service hub [5] Group 2: Mobile Digital Human and Dynamic Services - iFLYTEK introduced the Mobile Digital Human, which combines multimodal interaction with autonomous navigation, suitable for dynamic environments like exhibition halls and museums [7] - This system extends virtual human services beyond stationary touchpoints, providing contextual explanations in real time as it moves alongside visitors [7] Group 3: OmniAvatar and Personalization - OmniAvatar is a virtual human creation platform that allows for rapid cloning of voice and appearance, enabling customized service avatars [8] - In collaboration with the China Disabled Persons' Federation, it assists individuals in creating personalized avatars and synthetic voices, as well as digital twins for media professionals [9] Group 4: Embodied AI and Real-World Presence - iFLYTEK Guide01 is an embodied AI service robot that showcases lively demonstrations, providing a tangible physical presence in real-world environments [10] - The integration of flexible mobility and AI perception capabilities enhances the interaction between humans and AI [10] Group 5: Strategic Vision - iFLYTEK aims to integrate its virtual human technologies into real service scenarios across various industries, promoting efficient service delivery and natural human-AI interaction [12]
夸克、浙大开源OmniAvatar,一张图+一段音,就能生成长视频
机器之心· 2025-07-25 04:29
Core Insights - OmniAvatar is an innovative audio-driven full-body video generation model that requires only an image and an audio input to create corresponding videos, significantly enhancing lip-sync details and fluidity of full-body movements [1][6] - The model allows for precise control over character poses, emotions, and scenes through prompt words, showcasing its versatility in various applications [1][10] Performance Metrics - Experimental results indicate that OmniAvatar outperforms existing methods in lip-sync accuracy, facial and upper-body video generation, and text control, achieving a balance among video quality, accuracy, and aesthetics [3] - In comparison to other models, OmniAvatar achieved a FID score of 67.6 and a FVD score of 664, indicating superior performance in video generation tasks [5] Technical Innovations - OmniAvatar is based on the Wan2.1-T2V-14B model and utilizes LoRA for fine-tuning, effectively integrating audio features while maintaining the model's strong video generation capabilities [8] - The model employs a pixel-level audio embedding strategy that allows audio features to be integrated directly into the model's latent space, ensuring natural lip movements and coordinated body actions [13] Long Video Generation - The model has been optimized for long video generation, ensuring character consistency and temporal coherence through reference frame embedding and overlapping frame strategies [6][19] - By using a reference frame as a fixed guide for character identity and a latent overlapping strategy for seamless video continuity, OmniAvatar effectively anchors character identity across long video sequences [20] Future Directions - OmniAvatar represents an initial attempt in multi-modal video generation, with preliminary validation on experimental datasets, but it has not yet reached product-level application [22] - Future developments will focus on enhancing complex instruction processing capabilities and multi-character interactions to expand the model's applicability in more scenarios [22]
夸克AI实验室与浙大联合开源OmniAvatar:音频驱动全身视频生成新突破
Guan Cha Zhe Wang· 2025-07-25 04:16
Core Insights - Quark AI Technology Team has partnered with Zhejiang University to open-source OmniAvatar, an innovative audio-driven full-body video generation model that promises revolutionary changes in the video generation field [1] Group 1: Technology Advancements - OmniAvatar overcomes traditional limitations by enabling full-body motion driven by audio, rather than just facial movements, allowing for precise control [1] - The model generates videos by inputting a single image and an audio clip, significantly enhancing lip-sync details and the fluidity of full-body movements [1] - OmniAvatar incorporates a pixel-based audio embedding strategy, allowing audio features to be integrated at a pixel level within the model's latent space, resulting in more natural body movements [2] Group 2: Challenges and Solutions - Long video generation has been a challenge in audio-driven video creation; OmniAvatar addresses this with image embedding strategies and frame overlap techniques to ensure video coherence and consistent character identity [1] - A balance fine-tuning strategy based on LoRA has been proposed to efficiently adapt the model without altering its underlying capacity, allowing it to learn audio features while maintaining video quality and detail [2] Group 3: Future Directions - OmniAvatar represents an initial attempt in multi-modal video generation, having shown preliminary validation on experimental datasets but not yet reaching product-level application [2] - Future explorations will focus on enhancing complex instruction processing capabilities and multi-character interactions to broaden the model's applicability in various scenarios [2]
泡泡玛特王宁回应饥饿营销争议;马斯克预警特斯拉未来季度艰难
Group 1: Company Developments - Pop Mart's founder Wang Ning addressed the controversy over "hunger marketing," stating that the company is increasing production capacity to meet the demand for LABUBU, aiming to sell 10 million units monthly, with production capacity doubling this month compared to last [2] - Tesla's stock dropped 8.9%, with a market value loss of approximately 684.3 billion RMB, following a Q2 report showing a 12% year-over-year revenue decline and a 20.7% drop in net profit. CEO Elon Musk warned of challenging quarters ahead due to changes in electric vehicle tax credits and tariffs [2] - TikTok's revenue is projected to reach $23 billion in 2024, a 42.8% year-over-year increase, making it the fourth largest social media app globally. Despite a slowdown in profit growth for ByteDance, TikTok's overseas business revenue grew by 63%, accounting for a record 25% of the company's total revenue [5] - SenseTime's "1+X" structure adjustment has led to six ecosystem companies raising approximately 1.8 billion RMB, with a total equity value of around 10 billion RMB [7] - JD.com is in talks to acquire German electronics retailer Ceconomy AG, valued at approximately €2.2 billion (about $2.6 billion), with a potential offer of €4.60 per share, representing a 23% premium over the recent closing price [7] Group 2: Market Trends and Predictions - According to a recent survey, NVIDIA's Blackwell architecture GPUs are expected to account for 80% of the company's high-end GPU shipments this year, as the server market stabilizes and ODMs focus on AI server development [8] - AMD's CEO Lisa Su indicated that chips produced at TSMC's Arizona facility are 5% to 20% more expensive than those made in Taiwan, highlighting the cost challenges and supply chain resilience in the semiconductor industry [9] - IBM's Q2 software revenue fell short of market expectations, leading to a stock price drop of over 9% [10] - Alphabet, Google's parent company, reported Q2 revenue of $96.428 billion, a 14% year-over-year increase, with net profit rising 19% to $28.196 billion [11] Group 3: Innovations and New Products - Quark Technology and Zhejiang University have jointly open-sourced OmniAvatar, an audio-driven full-body video generation model that enhances lip-sync and motion fluidity based on a single image and audio input [16] - A new consumer-grade exoskeleton robot, VIATRIX, was launched by Aoshark Intelligent, designed to assist users in conserving energy and enhancing performance during various activities [17]
音频驱动全身视频生成模型 夸克与浙江大学联合开源OmniAvatar
news flash· 2025-07-25 01:27
Core Insights - The article highlights the launch of OmniAvatar, an innovative audio-driven full-body video generation model developed by Quark Technology Team in collaboration with Zhejiang University [1] Group 1: Product Features - OmniAvatar requires only a single image and an audio clip to generate corresponding videos, significantly enhancing lip-sync detail and the fluidity of full-body movements [1] - The model allows for precise control over character poses, emotions, and scenes through the use of prompt words [1]