Workflow
OmniAvatar
icon
Search documents
夸克、浙大开源OmniAvatar,一张图+一段音,就能生成长视频
机器之心· 2025-07-25 04:29
近期,夸克技术团队和浙江大学联合开源了 OmniAvatar,这是一个创新的音频驱动全身视频生成模 型, 只需要输入 一张图片 和 一段音频 ,OmniAvatar即可生成相应视频 , 且显著提升了画面中人物 的唇形同步细节和全身动作的流畅性。此外,还可通过 提示词 进一步精准控制人物姿势、情绪、场景 等要素。 OmniAvatar已开源: 以下,是OmniAvatar在播客、唱歌、交互、动态背景等场景下的部分案例。 实验表明,OmniAvatar在唇形同步、面部及半身视频生成、文本控制等多个维度上,均取得领先表 现,并更好地平衡了视频质量、准确度、审美三要素。 Model:https://huggingface.co/OmniAvatar/OmniAvatar-14B Code:https://github.com/Omni-Avatar/OmniAvatar Arxiv:https://arxiv.org/abs/2506.18866 Project Page:https://omni-avatar.github.io/ | Methods | FID t | FVDt | | Sync-Ct Sync- ...
夸克AI实验室与浙大联合开源OmniAvatar:音频驱动全身视频生成新突破
Guan Cha Zhe Wang· 2025-07-25 04:16
为了解决完整训练和仅微调特定层所带来的问题,团队还提出了一种基于LoRA的平衡微调策略。该策 略使用LoRA策略高效地适应模型,使模型能够在不改变底层模型容量的情况下学习音频特征,从而兼 顾了视频质量与细节。 OmniAvatar是团队在多模态视频生成上的初步尝试,并在实验数据集上得到了初步验证,但尚未达到产 品级应用水平。未来,团队还将在复杂指令处理能力、多角色交互等方面进一步探索,扩大模型在更多 场景中的应用。 本文系观察者网独家稿件,未经授权,不得转载。 此外,长视频连续生成是音频驱动视频生成的难点,也是一项关键挑战。为此,OmniAvatar通过参考图 像嵌入策略和帧重叠技术,确保了视频的连贯性和人物身份的一致性。 本次突破不仅体现在产品层面,OmniAvatar的技术革新同样值得关注。团队提出了一种基于像素的音频 嵌入策略,使音频特征可以直接在模型的潜在空间中以像素级的方式融入,从而生成更协调、更自然的 身体动作来匹配音频。同时,采用多层级音频嵌入策略,将音频信息嵌入到DiT模块的不同阶段中,确 保模型在不同层次上保持独立的学习路径。 近日,夸克AI技术团队与浙江大学强强联手,共同开源了一项创新成果 ...
泡泡玛特王宁回应饥饿营销争议;马斯克预警特斯拉未来季度艰难
Group 1: Company Developments - Pop Mart's founder Wang Ning addressed the controversy over "hunger marketing," stating that the company is increasing production capacity to meet the demand for LABUBU, aiming to sell 10 million units monthly, with production capacity doubling this month compared to last [2] - Tesla's stock dropped 8.9%, with a market value loss of approximately 684.3 billion RMB, following a Q2 report showing a 12% year-over-year revenue decline and a 20.7% drop in net profit. CEO Elon Musk warned of challenging quarters ahead due to changes in electric vehicle tax credits and tariffs [2] - TikTok's revenue is projected to reach $23 billion in 2024, a 42.8% year-over-year increase, making it the fourth largest social media app globally. Despite a slowdown in profit growth for ByteDance, TikTok's overseas business revenue grew by 63%, accounting for a record 25% of the company's total revenue [5] - SenseTime's "1+X" structure adjustment has led to six ecosystem companies raising approximately 1.8 billion RMB, with a total equity value of around 10 billion RMB [7] - JD.com is in talks to acquire German electronics retailer Ceconomy AG, valued at approximately €2.2 billion (about $2.6 billion), with a potential offer of €4.60 per share, representing a 23% premium over the recent closing price [7] Group 2: Market Trends and Predictions - According to a recent survey, NVIDIA's Blackwell architecture GPUs are expected to account for 80% of the company's high-end GPU shipments this year, as the server market stabilizes and ODMs focus on AI server development [8] - AMD's CEO Lisa Su indicated that chips produced at TSMC's Arizona facility are 5% to 20% more expensive than those made in Taiwan, highlighting the cost challenges and supply chain resilience in the semiconductor industry [9] - IBM's Q2 software revenue fell short of market expectations, leading to a stock price drop of over 9% [10] - Alphabet, Google's parent company, reported Q2 revenue of $96.428 billion, a 14% year-over-year increase, with net profit rising 19% to $28.196 billion [11] Group 3: Innovations and New Products - Quark Technology and Zhejiang University have jointly open-sourced OmniAvatar, an audio-driven full-body video generation model that enhances lip-sync and motion fluidity based on a single image and audio input [16] - A new consumer-grade exoskeleton robot, VIATRIX, was launched by Aoshark Intelligent, designed to assist users in conserving energy and enhancing performance during various activities [17]
音频驱动全身视频生成模型 夸克与浙江大学联合开源OmniAvatar
news flash· 2025-07-25 01:27
记者25日从阿里旗下夸克获悉,夸克技术团队和浙江大学最新联合开源了OmniAvatar,这是一个创新的 音频驱动全身视频生成模型,只需要输入一张图片和一段音频,即可生成相应视频,且显著提升了画面 中人物的唇形同步细节和全身动作的流畅性。此外,还可通过提示词进一步精准控制人物姿势、情绪、 场景等要素。 ...