文生视频

Search documents
让AI作画自己纠错!随机丢模块就能提升生成质量,告别塑料感废片
量子位· 2025-08-23 05:06
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI AI作画、生视频,可以「自己救自己」了?! 当大家还在为CFG(无分类器引导)的参数搞到头秃,却依然得到一堆"塑料感"废片而发愁时,来自清华大学、阿里巴巴AMAP(高德地 图)、中国科学院自动化研究所的研究团队,推出全新方法 S²-Guidance (Stochastic Self-Guidance)。 核心亮点在于通过 随机丢弃网络模块(Stochastic Block-Dropping)来动态构建"弱"的子网络,从而实现对生成过程的自我修正。这不仅 让AI学会了"主动避坑",更重要的是,它避免了其他类似方法中繁琐、针对特定模型的参数调整过程 ,真正做到了即插即用、效果显著。 S²-Guidance方法在文生图和文生视频任务中,显著提升了生成结果的质量与连贯性。 具体表现在: 一、CFG的瓶颈:效果失真 + 缺乏通用性 在扩散模型的世界里,CFG (Classifier-Free Guidance)是提升生成质量和文本对齐度的标准操作。但它的"线性外推"本质,导致高引导强度 下容易产生过饱和、失真等问题。 为了解决这个问题,学术界此前的思路是引入一个"监 ...
“文生视频”爆火 商业前景几何
Zhong Guo Qing Nian Bao· 2025-07-29 23:02
Group 1 - The core viewpoint of the articles highlights the rapid advancements and commercialization of AI technologies, particularly in video generation, which are transforming creative industries and enhancing productivity for content creators [1][3][2] - DeepSeek, a representative of Chinese AI technology, has gained attention for its ability to generate videos through AI models, showcasing the potential for widespread creative expression [1][3] - KuaLing AI, launched by Kuaishou, has achieved significant commercial success, with monthly revenue exceeding 100 million yuan in April and May 2023, and a user base surpassing 45 million since its launch [3][1] Group 2 - Huace Film & TV has initiated AI-driven model development, launching self-developed models like "Youfeng" and "Guose," indicating a trend of AI integration across the short drama production industry [2] - The P-end subscription model, primarily targeting professional users such as self-media video creators and advertising professionals, contributes nearly 70% of KuaLing AI's revenue, reflecting a strong demand for AI video generation tools [3][1] - The global video generation model has produced over 300 million videos in the past six months, demonstrating the extensive impact of AI on content creation [1][3]
2025年中国多模态大模型行业模型现状 图像、视频、音频、3D模型等终将打通和融合【组图】
Qian Zhan Wang· 2025-06-01 05:09
Core Insights - The exploration of multimodal large models is making gradual progress, with a focus on breakthroughs in visual modalities, aiming for an "Any-to-Any" model that requires successful pathways across various modalities [1] - The industry is currently concentrating on enhancing perception and generation models in image, video, and 3D modalities, with the goal of achieving cross-modal integration and sharing [1] Multimodal Large Models in Image - Prior to the rise of LLMs in 2023, the industry had already established a solid foundation in image understanding and generation, resulting in models like CLIP, Stable Diffusion, and GAN, which led to applications such as Midjourney and DALL·E [2] - The industry is actively exploring the integration of Transformer models into image-related tasks, with significant outcomes including GLIP, SAM, and GPT-V [2] Multimodal Large Models in Video - Video generation is being approached by transferring image generation models to video, utilizing image data for training and aligning temporal dimensions to achieve text-to-video results [5] - Recent advancements include models like VideoLDM and Sora, which demonstrate significant breakthroughs in video generation using the Diffusion Transformer architecture [5] Multimodal Large Models in 3D - The generation of 3D models is being explored by extending 2D image generation methods, with key models such as 3D GAN, MeshDiffusion, and Instant3D emerging in the industry [8][9] - 3D data representation includes various formats like meshes, point clouds, and NeRF, with NeRF being a critical technology for 3D data representation [9] Multimodal Large Models in Audio - AI technologies related to audio have matured, with recent applications of Transformer models enhancing audio understanding and generation, exemplified by projects like Whisper large-v3 and VALL-E [11] - The evolution of speech technology is categorized into three stages, with a focus on enhancing generalization capabilities across multiple languages and tasks [11]
钛媒体科股早知道:人形机器人+低空经济持续火热,该类产品市场需求水涨船高
Tai Mei Ti A P P· 2025-03-27 00:16
钛媒体科股早知道:人形机器人+低空经济持续火 热,该类产品市场需求水涨船高 必读要闻三:人形机器人+低空经济持续火热,该类产品市场需求水涨船高 据报道,高性能钕铁硼永磁材料是机器人伺服电机的核心材料。随着人形机器人、低空经济等新兴产业 的持续火热,稀土磁材市场的需求也水涨船高。 AI大模型的赋能,将促进自动驾驶、人形机器人、消费电子、低空经济等相关产业的提升,也将扩大 对稀土永磁产品的需求。中泰证券指出,人形机器人打开远期成长空间。2022年Tesla发布第一代 Optimus,近几年不断迭代,2025年将开始进入量产阶段,预计25年生产数千台人形机器人,2026年计 划将产量提高10倍,即5-10万台,同期国内厂商也持续加码对人形机器人的投入。按照一台人形机器人 单位用量2-4kg钕铁硼,假设远期达到1亿台,对磁材需求量将达到20-40万吨,相当于再造一个稀土永 磁市场,远期空间广阔。 必读要闻四:该重要的化工原料资源紧缺,原材料长期涨价叠加运费存在上涨预期 必读要闻一:全球首款!我国科学家研制出电池供电的可穿戴脑机接口设备 据媒体报道,从中国科学院自动化研究所获悉,该所脑网络组与脑机接口北京市重点实验室近 ...
活动报名:我们凑齐了 LCM、InstantID 和 AnimateDiff 的作者分享啦
42章经· 2024-05-26 14:35
清华交叉信息研究院硕士,研究方向为多模态生成,扩散模型,一致性模型 代表工作有 LCM, LCM-LoRA, Diff-Foley · 王浩帆 硕士毕业于 CMU,InstantX 团队成员,研究方向为一致性生成 代表工作有 InstantStyle, InstantID 和 Score-CAM · 杨策元 42章经 AI 私董会活动 文生图与文生视频 从研究到应用 分享嘉宾 · 骆思勉 LCM、InstantID 和 AnimateDiff 这三个研究在全球的意义和影响力都非常之大,可以说是过去一整年里给文生图和文生视频相关领域带来极大突破或应用 落地性的工作,相信有非常多的创业者都在实际使用这些作品的结果。 这次,我们首次把这三个工作的作者凑齐,并且还请来了知名的 AI 产品经理 Hidecloud 做 Panel 主持,届时期待和数十位 AI 创业者一起交流下文生图、文生视频 领域最新的研究和落地。 PhD 毕业于香港中文大学,研究方向为视频生成 6/01 | 13:00-14:00 (周六) 北京时间 美西时间 5/31 | 22:00-23:00 (周五) 活动形式 线上(会议链接将一对一发送) ...