应对Sora 2，谷歌发布新AI视频模型Veo 3.1：能精准可控视频生成

Core Insights - Google has launched its next-generation AI video generation model, Veo 3.1, which significantly enhances narrative control, audio integration, and visual realism in AI-generated videos [1][14] - The new model offers expanded possibilities for both individual creators using the Flow application and enterprise users seeking scalable, customizable video solutions [1][2] Narrative and Audio Control Enhancements - Veo 3.1 improves the handling of dialogue, ambient sound, and other audio elements, integrating native audio generation into three core functionalities of the Flow platform: "Frame to Video," "Material to Video," and "Extend Video" [2] - This integration allows for better emotional tone and narrative pacing control, streamlining the production process for professional content like training materials and marketing videos [2] Multi-Modal Input Architecture - The model supports various input forms, including text, images, and video clips, with enhanced output control [3] - New features allow for up to three reference images to precisely control the visual style of the output, enabling fine adjustments to meet brand standards [3] Cross-Platform Deployment Strategy - Veo 3.1 is available through multiple channels: Flow for general users and Gemini API for developers [4] - It includes features like frame interpolation for seamless transitions and scene extension capabilities to extend video duration intelligently [4] Professional Output Specifications - The model supports 720p and 1080p resolution outputs with a stable frame rate of 24 frames per second, allowing for video lengths of up to 148 seconds through extension features [6] - It ensures consistency in visual elements when users upload product images or style references, which is particularly valuable for retail and advertising sectors [6] Early User Feedback - Feedback on Veo 3.1 is mixed, with some users expressing disappointment compared to OpenAI's Sora 2, while acknowledging Google's strengths in reference image support and scene extension tools [7][11] - Some users noted limitations such as the lack of customizable voice options and the maximum generation length being capped at 8 seconds [8][11] Market Competition and Technological Evolution - The competitive landscape in AI video generation is intensifying, with Google and OpenAI vying for leadership in technology innovation and creative ecosystems [14] - The emergence of OpenAI's Sora has shifted the competitive dynamics, raising user expectations regarding authenticity, voice control, and generation length [11][14]