Workflow
多模态视频生成
icon
Search documents
【早报】沪深北交易所:优化再融资一揽子措施;高德打车被约谈
财联社· 2026-02-09 23:12
3、外交部发言人林剑昨日主持例行记者会。记者提问,美国总统特朗普启动的"和平委员会"计划于19日在华盛顿召开首次领导人会 议,外交部能否确认中国是否受邀参加此次会议?林剑表示,此前已经就有关"和平委员会"的问题做了回答,没有新的补充。 行 业 新 闻 1、交通运输新业态协同监管部际联席会议办公室组织对高德打车进行了约谈。约谈指出了高德打车对合作网约车平台管理不到位、 压低运价、应急处置不当等突出问题,并要求高德打车立即落实约谈要求,深刻反思,采取针对性措施,确保全面整改到位,切实维 护司机群体合法权益。 早 报 精 选 1、习近平:建设社会主义现代化强国,关键在科技自立自强。 4、美股纳指收涨近1%,贵金属连续两日反弹。 5、萃华珠宝公告,涉嫌信披违法违规,遭证监会立案。 宏 观 新 闻 1、9日上午,习近平总书记在位于北京亦庄的国家信创园考察科技创新工作,走进展厅察看人工智能、机器人等科技创新成果展示, 并同科研人员和科技企业负责人代表亲切交流。习近平说,建设社会主义现代化强国,关键在科技自立自强。要充分发挥我们国家集 中力量办大事的优势,把各种优质要素集合起来攻关,加快解决突出短板问题,实现我们的战略目标 ...
东方证券:视频生成进入精准控制时代 创作平权带动BC两端加速渗透
Zhi Tong Cai Jing· 2026-02-09 02:24
Core Viewpoint - The report from Dongfang Securities emphasizes the importance of vertical multi-modal AI application opportunities, highlighting that technological breakthroughs and cost optimization will accelerate industry trends, leading to user growth, increased payment penetration, and enhanced commercialization [1] Group 1: Industry Trends - Since the beginning of the year, the domestic multi-modal video generation sector has seen accelerated iteration of models, significantly narrowing the technological gap with overseas counterparts [2] - The most notable change is the introduction of intelligent storyboarding, which lowers the entry barrier for users, while a unified multi-modal architecture enhances the efficiency and flexibility of creative intent expression [2] - The firm predicts substantial progress in both B-end and C-end expansions by 2026, with a focus on observing AI penetration in the content sector [2] Group 2: Technological Advancements - The acceleration of model development among domestic video generation companies has led to significant improvements in foundational attributes such as physical realism, motion fluidity, and instruction adherence [3] - Recent model releases have enhanced capabilities in storyboard functions and audio-visual synchronization, addressing previous gaps in functionality [3] - The competition in the video generation sector is now akin to the state of large language models (LLMs) in 2025, with companies achieving high baseline capabilities, suggesting that future differentiation will depend on specific application scenarios [3] Group 3: User Accessibility - The video generation sector has transitioned to a "dashboard era" characterized by precision and control, with recent models supporting multi-modal input architectures [4] - The generation duration has become more user-friendly, increasing to approximately 15 seconds per generation, which further lowers the creative barrier for both B-end and C-end users [4] - The models now allow for detailed editing of generated content, facilitating quick adjustments and enhancing the efficiency of creative expression [4] Group 4: Investment Recommendations - Relevant investment targets include Alphabet Inc. (GOOGL.US), Kuaishou-W (01024), MINIMAX-WP (00100), and Meitu Inc. (01357) [4]
昆仑万维全新SkyReels正式焕新上线
Zheng Quan Ri Bao Wang· 2025-11-04 07:41
Core Insights - Kunlun Wanwei has launched its AI video creation platform SkyReels, which is now available on both web and mobile applications, aimed at enabling global users to create professional-level content easily [1] Group 1: Product Features - SkyReels' core positioning is its one-stop and multi-modal capabilities, integrating top global AI models such as Google Veo 3.1, Sora 2, and others, offering functionalities like image generation, video generation, digital humans, and music generation [1][2] - The newly launched SkyReels V3 is based on Kunlun Wanwei's self-developed model, featuring a series of multi-modal video generation models that utilize a Multi-modal In Context Learning framework for pre-training and fine-tuning [1][2] - The platform introduces an "Agentic Copilot" mode, which includes a dual-core intelligent system supporting multi-modal input and output, catering to both immediate creative needs and in-depth professional tasks [2] Group 2: Technological Advancements - SkyReels V3 is the first in the industry to support multi-person, multi-turn dialogue with digital humans, allowing precise control over each character's speaking timing and rhythm, enhancing the natural flow of multi-character interactions [2] - The digital human functionality supports various scenarios, including film-level dialogue, e-commerce dual-host broadcasts, and game material creation, marking a significant advancement in audio-driven video generation capabilities [3] - Future developments in visual and audio generation models are expected to accelerate, with improvements in model effectiveness and controllability, while content generation costs are anticipated to decrease [3]
国泰海通|传媒:Sora2正式发布,加快推动AI视频发展
Core Insights - OpenAI has officially launched its latest video generation model Sora 2, along with the Sora App, which has quickly topped the Apple US "Top Free Apps" chart [1] - Sora 2 features significant advancements in video authenticity, audio synchronization, and fine control, supporting immersive content generation for up to 10 seconds, with the Pro version extending to 15 seconds and higher resolution [1] - The Sora App aims to redefine social interaction and content creation, emphasizing a co-creation platform rather than a content consumption platform [1] Group 1: Technological Advancements - Multi-modal video generation is evolving towards global generation, effectively reducing costs and increasing efficiency in content production, particularly in animation [2] - Unlike retrieval-based or local generation, multi-modal video generation relies on text, images, and videos as prompts, showcasing the capabilities of large models for comprehensive generation [2] - Continuous updates and iterations of domestic and international multi-modal large models are enhancing stability, controllability, richness, and generation duration [2] Group 2: Content Innovation - The release of Sora 2 is expected to reshape IP value, with PGC content production capacity being unlocked through innovations like short dramas and interactive dramas [2] - OpenAI's CEO announced two key changes for Sora 2, allowing character rights holders to control how their characters are used for secondary creation, and exploring potential monetization models [2] - The Sora App is positioned for diverse applications in entertainment, social media, e-commerce marketing, and education, demonstrating significant application value in creative videos and brand advertising [2] Group 3: Investment Opportunities - The report identifies four categories of companies that may benefit from these developments: platform and model companies, IP resource companies, content innovation companies, and other multi-application companies [3]
阿里开源Wan2.2-S2V模型:静态图与音频合成电影级数字人视频
Sou Hu Cai Jing· 2025-08-27 15:54
Core Insights - Alibaba has launched its latest multimodal video generation model, Wan2.2-S2V, which has garnered significant attention in the industry due to its advanced capabilities [1] - The model allows users to generate high-quality digital human videos by simply providing a static image and an audio clip, achieving natural facial expressions and synchronized lip movements [1] - Wan2.2-S2V supports various image types and can create videos lasting up to several minutes, which is a leading feature in the industry [1] User Experience - The model is available for user experience on platforms like Hugging Face and the Magic Dock community, allowing for direct downloads and trials on the official website [1] - Users can upload images of different subjects, including humans, cartoons, and animals, and the model will animate them to speak, sing, or perform based on the provided audio [1] Technical Innovations - Wan2.2-S2V integrates multiple innovative technologies, including global motion control guided by text and fine-grained local motion driven by audio, enabling efficient video generation in complex scenarios [3] - The model employs AdaIN and CrossAttention mechanisms for more accurate and dynamic audio control, ensuring high-quality long video generation through hierarchical frame compression [3] - Alibaba's team trained the model on a dataset containing over 600,000 audio-video segments, utilizing mixed parallel training to maximize performance potential [3] Performance Metrics - Wan2.2-S2V has achieved the best results among similar models in key metrics such as video quality, expression realism, and identity consistency [4] - Since February of this year, the company has open-sourced several video generation models, with downloads exceeding 20 million, making it one of the most popular models in the open-source community [4]
阿里开源视频生成模型Wan2.2-S2V
Group 1 - The core point of the article is that Alibaba has launched a multimodal video generation model called Wan2.2-S2V, which can create high-quality digital human videos from a single static image and an audio clip [1] - The model is capable of generating videos with a duration of up to several minutes in a single instance [1]
多模态视频生成模型通义万相“Wan2.2-S2V”正式开源
Di Yi Cai Jing· 2025-08-26 13:57
Core Insights - The new multimodal video generation model "Wan2.2-S2V" has been officially open-sourced, allowing the creation of high-quality digital human videos from a single static image and an audio clip [2] - The model can generate videos with a duration of up to several minutes in a single instance, significantly enhancing video creation efficiency in industries such as digital human live streaming, film production, and AI education [2] - The model is now available on the official Tongyi Wanxiang website [2]
腾讯混元推出全新多模态视频生成工具 现已开源并上线官网
Sou Hu Cai Jing· 2025-05-10 14:48
Core Insights - Tencent has officially launched and open-sourced a new multimodal customized video generation tool called Hunyuan Custom, based on the Hunyuan Video model [1] Group 1: Product Features - Hunyuan Custom boasts strong multimodal fusion capabilities, processing text, images, audio, and video to create coherent and natural video content, significantly improving generation quality and control compared to traditional models [3] - The tool offers various video generation modes, including single subject video generation, multi-subject video generation, single subject video dubbing, and local video editing, with single subject generation already available for users [3] - Users can upload an image of a target person or object and provide a text description to generate videos with different actions, outfits, and scenes, addressing limitations in character consistency and scene transitions found in traditional models [3] Group 2: Application Scenarios - Hunyuan Custom has strong extensibility, allowing users to upload images and audio to create synchronized performances in various scenarios, such as digital human broadcasting, virtual customer service, and educational presentations [4] - The video-driven mode enables natural replacement or insertion of characters or objects from images into any video segment, facilitating creative embedding and scene expansion for video reconstruction and content enhancement [4]
图像提供身份,文本定义一切!腾讯开源多模态视频定制工具HunyuanCustom
AI科技大本营· 2025-05-09 09:35
Core Viewpoint - The article discusses the launch of Tencent's HunyuanCustom, a new multi-modal video generation framework that emphasizes customization capabilities as a key measure of system practicality [1][10]. Group 1: Technology Overview - HunyuanCustom is built on the HunyuanVideo model and supports various input modalities including images, text, audio, and video, enabling high-quality and controllable video generation [1][5]. - The framework addresses the "face-changing" challenge in traditional video generation models by maintaining subject consistency through a combination of image ID enhancement and multi-modal control inputs [3][6]. Group 2: Performance Comparison - Tencent's team conducted comparative tests of HunyuanCustom against several mainstream video customization methods, evaluating metrics such as face consistency, video-text consistency, semantic similarity, temporal consistency, and overall video quality [8]. - HunyuanCustom achieved a face consistency score of 0.627, outperforming other models, and also scored 0.593 in semantic similarity, indicating its leading position among current open-source solutions [9]. Group 3: System Architecture - The architecture of HunyuanCustom includes several key modules designed for decoupled control of image, voice, and video modalities, providing flexible interfaces for multi-modal generation [6][11]. - The data construction process incorporates models like Qwen, YOLO, and InsightFace to build a comprehensive labeling system covering various subject types, enhancing the model's generalization and editing flexibility [11]. Group 4: User Experience - The single subject generation capability of HunyuanCustom is currently available on the official website, with additional features set to be released throughout May [10]. - Users can access the experience through the provided links to the project website and code repository [12].
腾讯混元发布并开源视频生成工具HunyuanCustom,支持主体一致性生成
news flash· 2025-05-09 04:22
Core Insights - Tencent's Hunyuan team has launched and open-sourced a new multimodal customized video generation tool called HunyuanCustom, which is based on the HunyuanVideo model [1] - HunyuanCustom surpasses existing open-source solutions in subject consistency and is comparable to top proprietary models [1] - The tool integrates capabilities for generating videos from various multimodal inputs, including text, images, audio, and video, offering high control and quality in intelligent video creation [1]