Core Insights - Tencent's HunyuanVideo 1.5, a lightweight video generation model based on the Diffusion Transformer (DiT) architecture, has been officially released and open-sourced, featuring 8.3 billion parameters and the capability to generate 5-10 seconds of high-definition video [1][4]. Group 1: Model Capabilities - HunyuanVideo 1.5 supports both Chinese and English input for text-to-video and image-to-video generation, showcasing high consistency between images and videos [4]. - The model demonstrates strong instruction comprehension and adherence, allowing for diverse scene implementations, including camera movements, smooth actions, realistic characters, and emotional expressions [4]. - It supports various styles such as realism, animation, and block-based visuals, and can generate Chinese and English text within videos [4]. Group 2: Video Quality - The model can natively generate 5-10 seconds of high-definition video at 480p and 720p, with the option to enhance quality to 1080p cinematic level through a super-resolution model [4]. Group 3: Performance Comparison - In the T2V (Text-to-Video) task, HunyuanVideo outperformed several comparison models, achieving a win rate of +17.12% against Wan2.2 and +12.6% against Kling2.1 [6]. - In the I2V (Image-to-Video) task, HunyuanVideo also showed competitive results, with a win rate of +12.65% against Wan2.2 and +9.72% against Kling2.1 [6].
腾讯元宝上线视频生成能力