视频生成模型

Search documents
阿里开源版Sora上线即屠榜,4070就能跑,免费商用
量子位· 2025-02-26 03:51
Core Viewpoint - The article discusses the release of Alibaba's video generation model Wan 2.1, which outperforms competitors in the VBench ranking and introduces significant advancements in video generation technology [2][8]. Group 1: Model Performance - Wan 2.1 features 14 billion parameters and excels in generating complex motion details, such as synchronizing five individuals dancing hip-hop [2][3]. - The model has successfully addressed the challenge of generating text in static images, a previously difficult task [4]. - The model is available in two versions: a 14B version supporting 720P resolution and a smaller 1.3B version supporting 480P resolution, with the latter being more accessible for personal use [5][20]. Group 2: Computational Efficiency - The computational efficiency of Wan 2.1 is highlighted, with detailed performance metrics provided for various GPU configurations [7]. - The 1.3B version requires over 8GB of VRAM on a 4090 GPU, while the 14B version has higher memory demands [5][20]. - The model employs innovative techniques such as a 3D variational autoencoder and a diffusion transformer architecture to enhance performance and reduce memory usage [21][24]. Group 3: Technical Innovations - Wan 2.1 utilizes a T5 encoder for multi-language text encoding and incorporates cross-attention mechanisms within its transformer blocks [22]. - The model's design includes a feature caching mechanism in convolution modules to improve spatiotemporal compression [24]. - The implementation of distributed strategies for model training and inference aims to enhance efficiency and reduce latency during video generation [29][30]. Group 4: User Accessibility - Wan 2.1 is open-source under the Apache 2.0 license, allowing for free commercial use [8]. - Users can access the model through Alibaba's platform, with options for both rapid and professional versions, although high demand may lead to longer wait times [10]. - The model's capabilities have inspired users to create diverse content, showcasing its versatility [11][19].
晚点独家丨蚂蚁投资视频生成模型公司爱诗科技;奈雪投资人加入茶颜悦色
晚点LatePost· 2024-04-23 11:12
本期关注企业:蚂蚁金服、茶颜悦色、爱诗科技。 蚂蚁集团投资视频生成模型公司爱诗科技,由字节前视觉技术负责人王长虎创立 今年 2 月 OpenAI 发布 Sora 后,投资人对视频生成模型的判断更趋割裂:有人认为,OpenAI 已碾压其他公司, 创业机会不再;另一派观点是,Sora 证明视频生成模型路线清晰、成果可复制,这反而会给更多公司机会。 乐观者已用钱做出了选择。《晚点 LatePost》独家获悉,蚂蚁集团已于近期独家投资了中国视频生成大模型公司 爱诗科技的 A2 轮,该轮金额超过 1 亿元人民币。 接近蚂蚁的人士说,蚂蚁除自研大模型并落地应用外,也在持续关注行业的前瞻探索,围绕大模型技术能力、产 业应用和 AI 算力等核心技术和生态,已陆续投资了智谱 AI、月之暗面等大模型创业公司和专注多模态的生数科 技等。 爱诗科技成立于 2023 年 4 月,目前团队约有 30 人,创始人兼 CEO 王长虎曾任字节跳动视觉技术负责人,在视 频理解、数据处理、内容安全和视频生成等领域都有积累。 爱诗科技既做视频生成大模型,又做面向内容创作者和普通人的视频生成产品。 爱诗科技称自己 2023 年 6 月以来就尝试 Di ...