Seedance 2.0正式发布

Core Insights - ByteDance officially launched the next-generation video creation model Seedance 2.0, which features a unified multimodal audio-video generation architecture that supports text, images, audio, and video inputs [1][2] - Compared to version 1.5, Seedance 2.0 significantly enhances generation quality, particularly in complex interactions and motion scenarios, with improved physical accuracy, realism, and controllability [1] Group 1 - The model demonstrates higher usability in complex scenes, achieving state-of-the-art (SOTA) levels in multi-agent interactions and complex motion scenarios due to its excellent motion stability and physical restoration capabilities [1] - The multimodal capabilities are significantly strengthened, allowing users to input up to 9 images, 3 videos, 3 audio clips, and natural language instructions simultaneously, breaking traditional boundaries in video generation [1] - The controllability of video generation has been greatly improved, with enhanced instruction adherence and consistency, enabling users to easily manage the entire video creation process [1] Group 2 - The model supports high-quality multi-shot audio-video output for 15 seconds and features dual-channel audio capabilities, achieving highly realistic audiovisual effects [2] - The integrated reference and editing capabilities can significantly reduce content production costs in various sectors, including film, advertising, e-commerce, and gaming [2]