Workflow
MultiShotMaster
icon
Search documents
CVPR 2026 | 1B模型也能当多镜头导演?大连理工&快手可灵开源力作MultiShotMaster
机器之心· 2026-03-06 04:31
Core Viewpoint - The article discusses the development of MultiShotMaster, a highly controllable multi-shot video generation framework that allows for director-level shot scheduling and coherent storytelling, even with a model size of around 1 billion parameters, marking a significant advancement in the video generation field from traditional single-shot models to multi-shot capabilities [2][23]. Group 1: Product Development - MultiShotMaster was developed collaboratively by Dalian University of Technology, Kuaishou Keling team, and The Chinese University of Hong Kong, with the first author being a third-year PhD student focusing on video generation [1]. - The framework has been recognized for its capabilities, winning the AAAI CVM Workshop competition, which assessed consistency across knowledge, camera movement, and cross-shot ID [5]. Group 2: Technical Innovations - The framework innovatively modifies the traditional single-shot video generation architecture to support multi-shot video generation, utilizing a 3DVAE encoding for each shot and a temporal attention mechanism for integration [7]. - MultiShotMaster introduces a multi-shot narrative RoPE and a spatiotemporal position-aware RoPE, allowing for precise control over shot boundaries, character consistency, and motion trajectories without the need for additional parameters [12][23]. Group 3: Performance Metrics - In quantitative comparisons, MultiShotMaster outperformed existing state-of-the-art multi-shot video generation models in inter-shot consistency, narrative coherence, and reference image consistency [17][21]. - The model demonstrated superior performance metrics, achieving a Text Alignment score of 0.227 and an Inter-Shot Consistency score of 0.702 when using reference images, indicating its effectiveness in maintaining narrative flow and visual coherence [21]. Group 4: Future Implications - The automated multi-shot data annotation process and the open-source model are expected to provide strong support for community research, potentially advancing AI video creation into a new phase characterized by more coherent narratives and greater expressive freedom [24].