DualCamCtrl
Search documents
相机运动误差降低40%!DualCamCtrl:给视频生成装上「深度相机」,让运镜更「听话」
机器之心· 2025-12-21 04:21
Core Insights - The article discusses the limitations of current video generation models in understanding geometry and camera motion control, highlighting the need for explicit geometric understanding in video generation [3][4]. - A new end-to-end geometry-aware diffusion model framework called DualCamCtrl is introduced, which addresses these limitations by synchronously generating RGB and depth sequences [3][4][28]. Model Architecture - DualCamCtrl employs a dual-branch diffusion framework where one branch generates RGB representations and the other generates depth representations, allowing for effective modal fusion through the proposed Semantic Guided Mutual Alignment (SIGMA) mechanism [9][11]. - This design enables the model to independently evolve RGB and depth outputs while minimizing interference, ensuring coherent geometric guidance throughout the video generation process [9][12]. Training Strategy - The model incorporates a two-stage training strategy: the first stage focuses on decoupled multi-modal representation learning, while the second stage emphasizes cross-modal fusion modeling [11][21]. - The decoupled stage aims to independently learn appearance and geometric representations, while the fusion stage enhances the complementarity of RGB and depth information through joint optimization [21][22]. Experimental Results - DualCamCtrl significantly outperforms existing state-of-the-art methods in both quantitative and qualitative analyses, achieving over 40% reduction in camera motion error [4][23][28]. - The quantitative analysis shows that DualCamCtrl achieves lower FVD and FID scores compared to other methods, indicating superior performance in video generation tasks [27][28]. Conclusion - The introduction of DualCamCtrl represents a significant advancement in camera-controlled video generation, providing a new technical approach that enhances geometric perception and consistency in generated videos [28].