ConStoryBoard数据集
Search documents
CVPR 2026 | 从「单帧」到「分镜」:STAGE重新定义AI电影叙事
机器之心· 2026-03-22 01:17
Core Insights - The article discusses the evolution of AI video generation, highlighting the transition from simple animations to complex storytelling, emphasizing the importance of narrative coherence [3][6] - The proposed framework, STAGE, focuses on generating structured narratives through a "start-end frame pair" approach, which enhances the storytelling capability of AI [8][12] Technical Framework - STAGE introduces a new workflow that allows for the prediction of both the starting and ending frames of each shot, providing a more coherent narrative structure [8][10] - The core model, STEP2, acts as an AI director, translating scripts into executable visual storyboards [10][12] Key Features of STAGE - The framework includes a multi-shot memory pack to maintain character consistency across shots, ensuring that characters do not appear inconsistent [13] - A dual-encoding strategy is employed to ensure smooth transitions within individual shots, preventing abrupt movements [14] - The two-stage training scheme mimics film school training, first teaching the model basic shot language and then refining it with human-selected examples of good transitions [15] Data Foundation - A large-scale dataset, ConStoryBoard, has been created to train the model, consisting of 100,000 high-quality multi-shot segments with detailed annotations [17] Experimental Results - STAGE has been compared with various state-of-the-art multi-shot generation methods, demonstrating superior coherence and a better understanding of cinematic storytelling [19][22] - The results indicate that structured narrative control is essential for the future of multi-shot video generation, moving beyond mere pixel manipulation [24] Significance and Outlook - The article concludes that as AI learns to create films rather than just animations, a new era of AI-assisted storytelling will emerge, making filmmaking accessible to everyone [25]