视频生成
Search documents
AI问答,直接「拍」给你看!来自快手可灵&香港城市大学
量子位· 2025-11-22 03:07
Core Insights - The article introduces a novel AI model called VANS, which generates videos as answers instead of traditional text responses, aiming to bridge the gap between understanding and execution in tasks [3][4][5]. Group 1: Concept and Motivation - The motivation behind this research is to utilize video, which inherently conveys dynamic physical world information that language struggles to describe accurately [5]. - The traditional approach to "next event prediction" has primarily focused on text-based answers, whereas VANS proposes a new task paradigm where the model generates a video as the response [8][9]. Group 2: Model Structure and Functionality - VANS consists of a visual language model (VLM) and a video diffusion model (VDM), optimized through a joint strategy called Joint-GRPO, which enhances collaboration between the two models [19][24]. - The workflow involves two main steps: perception and reasoning, where the input video is encoded and analyzed, followed by conditional generation, where the model creates a video based on the generated text title and visual features [20]. Group 3: Optimization Process - The optimization process is divided into two phases: first, enhancing the VLM to produce titles that are visually representable, and second, refining the VDM to ensure the generated video aligns semantically with the title and context of the input video [25][28]. - Joint-GRPO acts as a director, ensuring that both the "thinker" (VLM) and the "artist" (VDM) work in harmony, improving their outputs through mutual feedback [34][36]. Group 4: Applications and Impact - VANS has two significant applications: procedural teaching, where it can provide customized instructional videos based on user input, and multi-future prediction, allowing for creative exploration of various hypothetical scenarios [37][41]. - The model has shown superior performance in benchmarks, significantly outperforming existing models in metrics such as ROUGE-L and CLIP-T, indicating its effectiveness in both semantic fidelity and video quality [46][47]. Group 5: Experimental Results - Comprehensive evaluations demonstrate that VANS excels in procedural teaching and future prediction tasks, achieving nearly three times the performance improvement in event prediction accuracy compared to the best existing models [44][46]. - Qualitative results highlight VANS's ability to accurately visualize fine-grained actions, showcasing its advanced semantic understanding and visual generation capabilities [50][53]. Conclusion - The research on Video-as-Answer represents a significant advancement in video generation technology, moving beyond entertainment to practical applications, enabling a more intuitive interaction with machines and knowledge [55][56].
腾讯元宝上线视频生成能力
Guan Cha Zhe Wang· 2025-11-21 08:58
据官方介绍,HunyuanVideo 1.5模型具备全面且强大的核心能力,支持中英文输入的文生视频与图生视 频。其图生视频能力展现出图像与视频的高度一致性。模型还具备强指令理解与遵循能力,能够精准地 实现多样化场景,包括运镜、流畅运动、写实人物和人物情绪表情等多种指令;同时支持写实、动画、 积木等多种风格,并可在视频中生成中英文文字。在画质方面,模型可原生生成5–10秒时长的480p和 720p高清视频,并可通过超分模型提升至1080p电影级画质。 11月21日,腾讯混元大模型团队正式发布并开源HunyuanVideo 1.5,一款基于 Diffusion Transformer (DiT)架构、参数为8.3B的轻量级视频生成模型,支持生成5-10秒的高清视频。 目前,腾讯元宝最新版已上线该模型能力。用户可通过两种方式即可体验:一是输入文字描述 (Prompt),直接实现"文生视频";二是上传图片配合Prompt,轻松将静态图片转化为动态视频。 相关使用示意 腾讯官方 | | | T2V 任务GSB对比 | | | --- | --- | --- | --- | | 对比模型 | HunyuanVideo 更 ...
快手:三季度经营利润同比增长69.9% 可灵AI收入超3亿元
Zhong Zheng Wang· 2025-11-20 06:03
Core Insights - Kuaishou reported a total revenue of 35.554 billion yuan for Q3, marking a year-on-year growth of 14.2% [1] - Operating profit increased by 69.9% year-on-year to 5.299 billion yuan, while adjusted net profit rose by 26.3% to 4.986 billion yuan [1] Revenue Breakdown - Revenue from other services, including e-commerce and Keling AI, grew by 41.3% to 5.9 billion yuan [1] - Online marketing service revenue increased by 14% to 20.1 billion yuan [1] - Live streaming revenue saw a modest growth of 2.5% to 9.6 billion yuan [1] - Keling AI generated over 300 million yuan in revenue during Q3, while e-commerce GMV grew by 15.2% to 385 billion yuan [1] User Engagement - The average daily active users reached 416 million, with monthly active users at 731 million [1] AI Integration and Market Position - Kuaishou's CEO attributed financial performance to the deep integration of AI capabilities across various business scenarios [2] - The video generation sector is experiencing rapid technological iteration and product exploration, with Keling AI positioned in the leading tier globally [2] - Keling AI launched the 2.5 Turbo model, enhancing multiple dimensions such as text response and aesthetic quality [2] Product Strategy and Future Outlook - Kuaishou aims to focus on AI film creation, enhancing technology and product capabilities [2] - The company is optimistic about the commercialization of video generation, particularly in consumer applications [3] - Kuaishou plans to explore consumer application scenarios while enhancing the experience for professional creators [3]
快手业绩会:加大AI投入 预计今年可灵收入约1.4亿美元
2 1 Shi Ji Jing Ji Bao Dao· 2025-11-19 14:37
Core Insights - Kuaishou's Q3 revenue reached 35.6 billion RMB, a year-on-year increase of 14.2%, with core business revenue growing by 19.2% [1] - The company's operating profit hit a record high, increasing by 69.9% year-on-year to 5.3 billion RMB, while adjusted net profit rose by 26.3% to 5 billion RMB [1] - The integration of AI capabilities into Kuaishou's business is a significant factor in its financial performance, with Keling AI generating over 300 million RMB in revenue during Q3 [1] Industry Dynamics - The video generation sector is experiencing rapid competition with numerous participants from both large internet companies and startups, indicating its potential as a high-quality market [2] - The industry is in an early stage of rapid technological iteration and product exploration, with competition driving advancements in video generation technology [2] - Keling AI remains a leader in the global video generation space, focusing on technological and product innovation to maintain its competitive edge [2] Product Strategy - Keling AI's core focus is on AI film creation, with an emphasis on resource aggregation to enhance technology and product capabilities [2] - The company plans to advance its product iterations by focusing on technological leadership and product imagination, utilizing multi-modal interaction concepts [2] - Keling AI aims to enhance the user experience for professional creators while exploring consumer applications, with plans to further commercialize its technology in the future [3] Financial Outlook - Kuaishou plans to increase investments in AI-related capabilities, expecting a mid-to-high double-digit percentage growth in overall capital expenditures for 2025 compared to the previous year [3] - Keling AI's projected revenue for 2025 is approximately 140 million USD, significantly higher than the initial target of 60 million USD [3] - Despite increased investments in AI capabilities and talent, the company remains confident in achieving year-on-year improvements in adjusted operating profit margins [3]
可灵AI全年收入约1.4亿美元,快手继续加大算力投入
Di Yi Cai Jing· 2025-11-19 14:24
Core Insights - Kuaishou's Q3 2025 financial report shows a total revenue increase of 14.2% year-on-year to 35.6 billion RMB, with adjusted net profit rising by 26.3% to 5 billion RMB [1] - The online marketing services revenue grew by 14% to 20.1 billion RMB, while live streaming revenue increased by 2.5% to 9.6 billion RMB [1] - E-commerce GMV for Kuaishou increased by 15.2% year-on-year to 385 billion RMB, and the revenue from Keling AI exceeded 300 million RMB [1] Business Segments - Online Marketing Services: Revenue increased by 14% to 20.1 billion RMB [1] - Live Streaming: Revenue increased by 2.5% to 9.6 billion RMB [1] - Other Services: Revenue rose by 41.3% to 5.9 billion RMB, driven by growth in e-commerce and Keling AI [1] AI Development Focus - Keling AI remains a key focus in Kuaishou's earnings call, with the CEO highlighting the competitive landscape in video generation and the potential for rapid technological advancement [2] - The company aims to concentrate on AI film creation, enhancing technology and product capabilities through resource aggregation [2] - Kuaishou plans to further commercialize Keling technology in conjunction with social interaction, aiming for accelerated C-end application commercialization [2] Capital Expenditure and AI Integration - Kuaishou's CFO indicated that due to the unexpected growth of Keling AI, the company will increase its capital expenditure, expecting a mid-to-high double-digit percentage increase in 2025 compared to the previous year [3] - Keling AI is projected to generate approximately 140 million USD in revenue for 2025, surpassing the initial target of 60 million USD [3] - AI applications are being rapidly integrated within Kuaishou, with the self-developed AI programming tool CodeFlicker being widely adopted by engineers, generating nearly 30% of new code [3]
快手程一笑:可灵AI将重点聚焦AI影视制作场景 视频生成赛道仍在早期
Zheng Quan Shi Bao Wang· 2025-11-19 12:57
11月19日晚,快手举行2025年第三季度业绩电话会。会上,针对市场高度关注的视频生成赛道竞争格局 及可灵AI的下一步迭代方向,快手科技创始人兼首席执行官程一笑进行了回应。 程一笑表示,当前视频生成赛道涌现出众多来自互联网大厂与创业公司等不同类型的参与者,这体现出 视频生成是一个极具潜力的优质赛道,也意味着行业仍处在快速技术迭代和产品形态探索的早期阶段。 更重要的是,整个行业正在通过竞争加速进步,推动视频生成技术更好地满足用户需求,渗透更多应用 场景。 11月19日,快手披露的2025年三季报显示,三季度可灵AI营业收入超3亿元,其全球用户规模突破4500 万,累计生成超2亿个视频和4亿张图片。 面对快速演进的竞争格局,可灵AI保持着持续的技术与产品创新。今年9月底,可灵推出2.5 Turbo模 型,在文本响应、动态效果、风格保持、美学质量等多个维度实现大幅提升。 程一笑表示,可灵的愿景是"让每个人都能用AI讲出好故事",公司将聚焦于AI影视创作这一核心目标, 聚合资源深入打磨技术与产品能力。在具体迭代方向上,可灵将围绕技术领先性与产品想象力双线推 进,围绕多模态交互理念(如MVL),结合用户需求洞察与技术突 ...
快手程一笑:视频生成是一个极具潜力的优质赛道
Zheng Quan Shi Bao Wang· 2025-11-19 12:00
人民财讯11月19日电,11月19日,在快手2025年第三季度业绩电话会上,快手创始人兼首席执行官程一 笑表示,当前视频生成赛道涌现出众多来自互联网大厂与创业公司等不同类型的参与者,这体现出视频 生成是一个极具潜力的优质赛道,也意味着行业仍处在快速技术迭代和产品形态探索的早期阶段。更重 要的是,整个行业正在通过竞争加速进步,推动视频生成技术更好满足用户需求,渗透更多应用场景。 ...
快手(01024)程一笑:可灵AI将重点聚焦AI影视制作场景 视频生成赛道仍在早期
Zhi Tong Cai Jing· 2025-11-19 11:52
智通财经APP获悉,2025年第三季度业绩电话会上,针对市场高度关注的视频生成赛道竞争格局及可灵 AI的下一步迭代方向,快手科技(01024)创始人兼首席执行官程一笑进行了详细回应。 程一笑表示,当前视频生成赛道涌现出众多来自互联网大厂与创业公司等不同类型的参与者,这一方面 体现出视频生成是一个极具潜力的优质赛道,另一方面,也意味着行业仍处在快速技术迭代和产品形态 探索的早期阶段。更重要的是,整个行业正在通过竞争加速进步,推动视频生成技术更好满足用户需 求,渗透更多应用场景。 面对快速演进的竞争格局,可灵AI依靠持续的技术与产品创新,始终处于全球视频生成赛道的第一梯 队。9月底,可灵重磅推出2.5 Turbo模型,在文本响应、动态效果、风格保持、美学质量等多个维度实 现大幅提升。模型发布十天后,即同时登上知名AI测评机构 Artificial Analysis 的全球文生视频和图生视 频榜单第一名。 在生态构建上,可灵也正在通过丰富的运营活动为创作者构建起全链路的创作者成长机制和繁荣的创作 生态,例如,可灵AI"未来合伙人计划"整合快手与可灵AI核心资源,为创作者精准匹配多场景的高价值 商单,目前已合作包括 ...
何必DiT!字节首次拿着自回归,单GPU一分钟生成5秒720p视频 | NeurIPS'25 Oral
量子位· 2025-11-14 05:38
Core Viewpoint - The article discusses the introduction of InfinityStar, a new method developed by ByteDance's commercialization technology team, which significantly improves video generation quality and efficiency compared to the existing Diffusion Transformer (DiT) model [4][32]. Group 1: InfinityStar Highlights - InfinityStar is the first discrete autoregressive video generator to surpass diffusion models on VBench [9]. - It eliminates delays in video generation, transitioning from a slow denoising process to a fast autoregressive approach [9]. - The method supports various tasks including text-to-image, text-to-video, image-to-video, and interactive long video generation [9][12]. Group 2: Technical Innovations - The core architecture of InfinityStar employs a spatiotemporal pyramid modeling approach, allowing it to unify image and video tasks while being an order of magnitude faster than mainstream diffusion models [13][25]. - InfinityStar decomposes video into two parts: the first frame for static appearance information and subsequent clips for dynamic information, effectively decoupling static and dynamic elements [14][15][16]. - Two key technologies enhance the model's performance: Knowledge Inheritance, which accelerates the training of a discrete visual tokenizer, and Stochastic Quantizer Depth, which balances information distribution across scales [19][21]. Group 3: Performance Metrics - InfinityStar demonstrates superior performance in the text-to-image (T2I) task on GenEval and DPG benchmarks, particularly excelling in spatial relationships and object positioning [25][28]. - In the text-to-video (T2V) task, InfinityStar outperforms all previous autoregressive models and achieves better results than DiT-based methods like CogVideoX and HunyuanVideo [28][29]. - The generation speed of InfinityStar is significantly faster than DiT-based methods, with the ability to generate a 5-second 720p video in under one minute on a single GPU [31].
AI 大牛刘威创业公司完成 5000 万美元融资,12 月将发布新模型
AI前线· 2025-11-07 06:41
Core Insights - Video Rebirth, founded by Liu Wei, has completed a $50 million seed round funding to develop a video generation model aimed at the professional creative industry [2] - The company aims to make video creation as intuitive as conversing with a chatbot, providing controllable, high-fidelity, and physics-compliant AI video creation capabilities [2] - The funding will accelerate the development of their proprietary "Bach" model and unique "Physics Native Attention (PNA)" architecture, addressing significant challenges in the AI-generated entertainment (AIGE) sector [2] Funding and Development - The seed funding round was backed by Qiming Venture Partners and South Korean gaming company Actoz Soft Co. [2] - Video Rebirth plans to release the Bach model in December, along with an AI video generation platform to compete with OpenAI Sora [2][3] Competitive Landscape - Video Rebirth is entering a competitive field with major players like Google, ByteDance, and Kuaishou, which have shown strong monetization capabilities [3] - Kuaishou's Kling AI is projected to exceed $100 million in annual revenue by February next year [3] Model Performance - The newly evaluated Avenger 0.5 Pro model has shown significant performance improvements compared to its predecessor, ranking second in the Image to Video category on the Artificial Analysis Video Arena [3] - The model has not yet been made publicly accessible [3] Market Positioning - Liu Wei believes that while the landscape for large language models is dominated by major players, there is a fair opportunity for smaller teams in the video generation space [4] - The company will initially target professional users in the U.S. with a subscription model priced lower than Google Veo [4] Team and Expertise - Liu Wei and his team spent three months training the first version of their model, which incorporates industry-standard techniques with improvements for realistic object generation [4] - The team avoided using short video content for training to ensure higher model quality [4]