Workflow
MIDAS
icon
Search documents
从「对口型」到「会表演」,刚进化的可灵AI数字人,技术公开了
机器之心· 2025-09-15 12:19
Core Viewpoint - The article discusses the advancements made by Kuaishou's Keling team in creating a new digital human generation paradigm, specifically through the Kling-Avatar project, which allows for expressive and natural performances in long videos, moving beyond simple lip-syncing to full-body expressions and emotional engagement [2][31]. Group 1: Technology and Framework - The Kling-Avatar utilizes a two-stage generative framework powered by a multimodal large language model, enabling the transformation of audio, visual, and textual inputs into coherent storylines for video generation [6][10]. - A multimodal director module organizes inputs into a structured narrative, extracting voice content and emotional trajectories from audio, identifying human features and scene elements from images, and integrating user text prompts into actions and emotional expressions [8][10]. - The system generates a blueprint video that outlines the overall rhythm, style, and key expression nodes, which is then used to create high-quality sub-segment videos [12][28]. Group 2: Data and Training - The Keling team collected thousands of hours of high-quality video data from various sources, including speeches and dialogues, to train multiple expert models for assessing video quality across several dimensions [14]. - A benchmark consisting of 375 reference image-audio-text prompt pairs was created to evaluate the effectiveness of the digital human video generation methods, providing a challenging testing scenario for multimodal instruction following [14][23]. Group 3: Performance and Results - The Kling-Avatar demonstrated superior performance in a comparative evaluation against advanced products like OmniHuman-1 and HeyGen, achieving higher scores in overall effectiveness, lip sync accuracy, visual quality, control response, and identity consistency [16][24]. - The generated lip movements were highly synchronized with audio, and facial expressions adapted naturally to vocal variations, even during complex phonetic sounds [25][26]. - Kling-Avatar's ability to generate long videos efficiently was highlighted, as it can produce multiple segments in parallel from a single blueprint video, maintaining quality and coherence throughout [28]. Group 4: Future Directions - The Keling team aims to continue exploring advancements in high-resolution video generation, fine-tuned motion control, and complex multi-turn instruction understanding, striving to imbue digital humans with a genuine and captivating presence [31].
快手可灵团队提出MIDAS:压缩比64倍、延迟低于500ms,多模态互动数字人框架实现交互生成新突破
机器之心· 2025-09-13 08:54
Core Viewpoint - The article discusses the rapid development of digital human video generation technology, highlighting the introduction of the MIDAS framework by Kuaishou's Kling Team, which addresses significant challenges in real-time, multimodal control, and long-term consistency in digital human interactions [2][16]. Group 1: MIDAS Framework Overview - MIDAS (Multimodal Interactive Digital-human Synthesis) combines autoregressive video generation with lightweight diffusion denoising heads to achieve real-time, smooth digital human video synthesis under multimodal conditions [2][5]. - The system demonstrates three core advantages: high compression rates, low latency, and efficient denoising, making it suitable for real-time interactive applications [4][14]. Group 2: Technical Innovations - The framework utilizes a 64× compression ratio autoencoder, reducing each frame to a maximum of 60 tokens, significantly lowering computational load [4][8]. - MIDAS supports various input signals, including audio, posture, and text, through a unified multimodal condition projector that encodes different modalities into a shared latent space [5][12]. - The model architecture employs a Qwen2.5-3B autoregressive backbone with a diffusion head based on PixArt-α/mlp structure, ensuring coherence in generated outputs while minimizing computational delays [12][16]. Group 3: Training and Data - A large-scale multimodal dialogue dataset of approximately 20,000 hours was constructed to train the model, encompassing single and dual dialogue scenarios across multiple languages and styles [10][12]. - The training strategy includes controllable noise injection to mitigate exposure bias during inference, enhancing the model's performance [12]. Group 4: Application Scenarios - MIDAS can generate real-time dual-person dialogue, synchronizing lip movements, expressions, and listening postures with audio streams [13]. - The model achieves cross-language singing synthesis without explicit language identifiers, maintaining lip-sync across Chinese, Japanese, and English songs for videos up to 4 minutes long [13][14]. - MIDAS demonstrates potential as an interactive world model by responding to directional control signals in environments like Minecraft, showcasing scene consistency and memory capabilities [13][14]. Group 5: Future Directions - The team plans to explore higher resolution and more complex interaction logic in future developments, aiming to deploy the system in real product environments [17].
X @Crypto Rover
Crypto Rover· 2025-09-10 12:37
Market Performance - MIDAS is live for trading on WEEX [1] - MIDAS initial valuation was 150万 (1.5 million), and current valuation is 350万 (3.5 million) [1] - The listing is considered super bullish [1]
X @Poloniex Exchange
Poloniex Exchange· 2025-09-04 13:35
🎉 Poloniex 1.2M $MIDAS Giveaway!💰1️⃣ Follow @midasonbase & @Poloniex2️⃣ Like, RT & Tag 3 Friends3️⃣ Join TG https://t.co/tIOz6Lzvfz & https://t.co/WHcXBg1Mww4️⃣ Fill up form here https://t.co/vso3fQhlod⏰ Sep 10 end🎁 3 winners #Airdrop #Giveaway https://t.co/CnDM1iR0f6 ...
X @Poloniex Exchange
Poloniex Exchange· 2025-09-03 07:36
🚀 Poloniex New Listing $MIDAS @MidasOnBase✅ Deposit open on Sep 3rd, 10:00 (UTC)✅ Full trading on Sep 3rd, 14:00 (UTC)Details: https://t.co/KVIrvaFx67 https://t.co/sOv1pQYYKh ...
X @Crypto Rover
Crypto Rover· 2025-08-30 14:31
$MIDAS is doing really great 🚀Already 2x’d from my initial call!Looks like it’s ready to send higher any moment hereCA: 0xB8d59c7B33054BEdA610B2b2D38EA38694cdfaBd https://t.co/X8fZkB8kmqCrypto Rover (@rovercrc):$MIDAS meme is going crazy lately!This thing has been absolutely skyrocketing since launch and it could be up only from here.YOU ARE SO EARLYExchanges in the pipeline, fully locked $400k+ liquidity pool at a 2.5m MC. Send it quickly higher!There are some HUGE players https://t.co/EeWLyUquJd ...