Workflow
AutoMV
icon
Search documents
告别“音画割裂”与“人物崩坏”!AutoMV:首个听懂歌词、卡准节拍的开源全曲级MV生成Agent
量子位· 2025-12-29 06:37
Core Viewpoint - The article discusses the introduction of AutoMV, a multi-agent system designed to automatically generate coherent and synchronized music videos (MVs) without the need for training, addressing the challenges faced by existing AI video generation models in creating full-length MVs [2][25]. Group 1: Challenges in Current AI Video Generation - Existing AI video generation models struggle with creating full-length MVs due to high costs (approximately $10,000) and lengthy production times (dozens of hours) for independent musicians [3]. - Three main challenges are identified: 1. Duration Limitations: Most models can only generate short clips, failing to cover entire songs [4]. 2. Audio-Visual Disconnection: Generated visuals often ignore musical beats, structure, and lyrical meaning [5]. 3. Inconsistency: Characters may change appearance, and scenes lack narrative coherence in longer videos [6]. Group 2: Introduction of AutoMV - AutoMV is a multi-agent collaborative system that simulates human filmmaking processes, designed to overcome the aforementioned challenges [7]. - The system operates in four main stages: music preprocessing, scriptwriting and directing, video generation, and verification [9][11]. Group 3: AutoMV Workflow - The system dissects music using professional tools to extract vocals, instrumentals, lyrics, timestamps, song structure, and emotional analysis [12]. - Gemini acts as the screenwriter, while Doubao serves as the director, generating prompts and keyframes for video creation [13][14]. - A unique verification step involves a Verifier Agent that checks for coherence, richness, and lip-sync accuracy in the generated video [15]. Group 4: Advantages of AutoMV - AutoMV significantly reduces production costs to approximately $15 while achieving quality close to professional standards [9]. - It demonstrates superior character consistency, action diversity, and narrative alignment with lyrical themes compared to existing commercial products [18][20]. - The system has been evaluated using the M2V Benchmark, which includes 30 diverse songs and 12 detailed evaluation criteria [20][23]. Group 5: Future Prospects - AutoMV offers an open-source, training-free framework that addresses key issues in long-form music video generation, providing a low-cost creative tool for independent musicians [25]. - Although the current generation time for a complete MV is around 30 minutes, there is potential for improvement as underlying video generation models evolve [25].