视频生成技术
Search documents
东方证券:维持快手-W“买入”评级 目标价104.36港元
Zhi Tong Cai Jing· 2026-02-05 06:16
东方证券发布研报称,预测快手-W(01024)25-27年经调整归母净利润为204/225/259亿元。维持此前给予 公司26年18xPE估值,对应合理价值为4,048亿CNY,折合4,542亿HKD(港币兑人民币汇率0.891),目 标价104.36港元/股,维持"买入"评级。 东方证券主要观点如下: 目前数据意义在于整体稳在更高水位后,从1月初主要在低ARPU地区流量扩圈(如东南亚、中亚等) →高付费能力地区收入震荡爬升,预期后者对可灵ARR提升作用更大。此外新一代可灵3.0版本内测 中,统一架构下工作流衔接性更好,且区别于竞品的迭代在于原生文本输出,预计进一步推进技术前沿 水平。产品层面更侧重于专业创作各环节提效,若年前能全量使用,有望延续1月产品热度,带动需求 释放。 1月产品出圈后,目前DAU保持较高水位 可灵3.0版本正处于内测中,部分评测提示或有的迭代包括:(1)可灵3.0是基于统一多模态底座训练, 支持文/图生视频、参考生视频、视频编辑一体化,生成视频时长可在3~15秒灵活控制,并且音频输出 更原生融合,真实感更好;(2)智能分镜,具备多镜头的storyboard工作流,镜头控制更精细;(3) ...
东方证券:维持快手-W(01024)“买入”评级 目标价104.36港元
智通财经网· 2026-02-05 06:14
智通财经APP获悉,东方证券发布研报称,预测快手-W(01024)25-27年经调整归母净利润为204/225/259 亿元。维持此前给予公司26年18xPE估值,对应合理价值为4,048亿CNY,折合4,542亿HKD(港币兑人 民币汇率0.891),目标价104.36港元/股,维持"买入"评级。 东方证券主要观点如下: 目前数据意义在于整体稳在更高水位后,从1月初主要在低ARPU地区流量扩圈(如东南亚、中亚等) →高付费能力地区收入震荡爬升,预期后者对可灵ARR提升作用更大。此外新一代可灵3.0版本内测 中,统一架构下工作流衔接性更好,且区别于竞品的迭代在于原生文本输出,预计进一步推进技术前沿 水平。产品层面更侧重于专业创作各环节提效,若年前能全量使用,有望延续1月产品热度,带动需求 释放。 风险提示:宏观消费恢复不及预期,国内商业化效率不及预期,海外业务亏损加大,可灵技术迭代不及 预期。 1月产品出圈后,目前DAU保持较高水位 Sensor Tower数据显示,1月可灵移动端海外总收入309万美元,环比增长112%;DAU 694万,环比增长 139%。分国家看,目前美国的收入仍处于震荡爬坡状态,也是移 ...
锦秋被投生数科技首席科学家朱军教授当选ACM Fellow|Jinqiu Spotlight
锦秋集· 2026-01-22 06:26
Core Insights - The article highlights the announcement of the 2025 ACM Fellow list, featuring notable scholars, including Professor Jun Zhu from Tsinghua University, recognized for his contributions to machine learning and Bayesian methods [2][11]. Group 1: ACM Fellow Announcement - The 2025 ACM Fellow list includes 19 Chinese scholars, accounting for approximately 27% of the total [6][14]. - The ACM Fellow designation is a prestigious honor, representing the top 1% of ACM members, with over 100,000 members globally [7][11]. - The contributions of the 2025 Fellows span various fields, including medical AI, computer graphics, data management, human-computer interaction, and robotics [12]. Group 2: Contributions of Notable Scholars - Jun Zhu is recognized for his work in probabilistic machine learning theories and methods, particularly in representation learning and sparse topic coding [103]. - Baoquan Chen from Peking University is acknowledged for his contributions to large-scale scene reconstruction and discrete geometry processing [20]. - Pei Cao, currently at YouTube, is honored for her advancements in network caching and search engine efficiency [15][19]. Group 3: Industry Implications - The article discusses the potential impact of video generation technology, with a focus on the U-ViT architecture developed by Shengshu Technology, which is expected to revolutionize content production by 2026 [4]. - The shift in focus from model breakthroughs to deeper integration into production scenarios is anticipated as the industry evolves [4].
通用级PixVerse R1的技术突破,揣着进入平行世界的密码
机器之心· 2026-01-15 09:17
Core Viewpoint - The article discusses the launch of PixVerse R1, a groundbreaking model in video generation that enables real-time, high-quality video creation, marking a significant advancement in the industry [1][3][38]. Group 1: Technological Breakthroughs - PixVerse R1 is the first global model to support real-time generation of 1080P resolution videos, transitioning video generation from static output to real-time interaction [6][35]. - The model achieves a significant increase in computational efficiency, allowing for real-time generation within the human perception range, thus representing a generational leap in application-level capabilities [3][6]. - The Instantaneous Response Engine (IRE) is introduced, which drastically reduces inference time by compressing the sampling steps from over 50 to just 1-4, addressing the computational load effectively [9][11]. Group 2: Model Architecture - The Omni model is a native end-to-end multimodal foundation that allows for the simultaneous processing of various data types, enhancing the model's versatility and efficiency [20][25]. - The model employs a unified token flow architecture based on Transformer, enabling the joint processing of text, images, audio, and video, thus improving the model's understanding of multimodal data [21][25]. - The model's native resolution feature ensures high-quality video generation without compromising the integrity of the visual content, addressing issues related to traditional data preprocessing methods [22][23]. Group 3: Continuous Evolution - PixVerse R1 introduces a self-regressive streaming generation mechanism that allows for theoretically infinite video generation, breaking the constraints of fixed-length outputs [29][32]. - The model incorporates a memory-enhanced attention module that captures and retains key features from the video, optimizing computational efficiency while maintaining long-term consistency [30][32]. - This architecture ensures that the generated content remains coherent and logically consistent, regardless of the length of the video, thus establishing a robust foundation for a universal real-time world model [32][38].
500万次围观,1X把「世界模型」真正用在了机器人NEO身上
机器之心· 2026-01-14 01:39
Core Viewpoint - The article discusses the advancements in the home humanoid robot NEO, particularly the introduction of its new brain, the 1X World Model, which enables NEO to learn and perform tasks more autonomously by understanding the physical world through video training [3][4][11]. Group 1: Technological Advancements - NEO has evolved from merely executing pre-programmed actions to being able to "imagine" tasks by generating a video of successful task completion in its mind before executing it [4][6]. - The 1X World Model (1XWM) integrates video pre-training to allow NEO to generalize across new objects, movements, and tasks without extensive prior training data [11][21]. - The model is built on a 14 billion parameter generative video model, which has undergone a multi-stage training process to adapt to NEO's physical characteristics [16][18]. Group 2: Training and Evaluation - The training process includes using 900 hours of first-person human video data to align the model with human-like operational behaviors, followed by fine-tuning with 70 hours of robot data [18][19]. - The evaluation of 1XWM's capabilities shows that it can perform tasks it has never encountered before, with generated videos closely matching real-world execution [24][30]. - The importance of high-quality subtitles and first-person data in improving video generation quality and task success rates is emphasized, indicating that detailed descriptions enhance the model's performance [39][40]. Group 3: Practical Applications - NEO has been tested on various tasks, including those requiring complex interactions and coordination, demonstrating its ability to adapt and learn from video pre-training [28][30]. - The model's performance in both in-distribution and out-of-distribution tasks shows a stable success rate, although some fine manipulation tasks remain challenging [30][32]. - The article suggests that the quality of generated videos can be linked to task success rates, allowing for potential improvements in video generation through iterative testing and selection processes [32][39].
AI漫剧产业前瞻:多模态技术突破与内容生产新范式
2025-12-11 02:16
AI 漫剧产业前瞻:多模态技术突破与内容生产新范式 20251210 摘要 巨量平台通过训练专属模型和要求用户提供多视图人物资产,结合自身 技术进行处理,以保持场景和人物的一致性,尽管市面上有类似功能, 但巨量平台在人物资产制作标准上进行了深入探索,从而实现高质量的 一致性效果。 为解决视频生成中的连贯性与一致性问题,巨量平台审核客户提供的人 物资产,确保符合标准,并通过精准服务和实时互动解决具体问题,同 时,通过培训和指导客户正确使用工具,使他们能够独立解决类似问题。 巨量平台对数据资产有明确标准,如要求提供大头照及三视图组合的人 物特写,并提供详细指导,协助客户优化数据资产,同时,通过深度交 流和共创,与国内一线模型厂商合作,不断推动行业标准化,提高整体 生产效率和效果。 目前视频生成技术中,人物、场景和物品的一致性对于画面还原最为重 要,高精度还原要求物体放置在正确位置且不能改变其本身特性,巨量 平台正在帮助模型厂商制定统一标准,而动作和运镜通过结合模型能力 与工程化工具可以很好地实现。 Q&A 巨量平台在图像和视频生成方面的技术基础是什么?是否基于 Stable Diffusion 进行二次开发? 我 ...
快手可灵AI全年预计收入1.4亿美元 创始人称视频生成技术远未成熟
Zhong Guo Jing Ying Bao· 2025-11-20 13:46
Financial Performance - Kuaishou Technology reported total revenue of 35.6 billion yuan for Q3 2025, representing a year-on-year growth of 14.2% [2] - Adjusted net profit reached 5 billion yuan, with a year-on-year increase of 26.3%, indicating stable operational growth [2] - Revenue from online marketing services was 20.1 billion yuan, up 14% year-on-year; live streaming revenue was 9.6 billion yuan, growing 2.5% year-on-year; other services, including e-commerce and Keling AI, generated 5.9 billion yuan, marking a significant growth of 41.3% [2] Keling AI Performance - Keling AI's revenue for Q3 exceeded 300 million yuan, contributing to the overall revenue growth [2] - The CFO disclosed that Keling AI's full-year revenue is projected to reach 1.4 billion yuan, exceeding the initial target of 600 million yuan by over 100% [2] - Keling AI's revenue growth has slowed in Q3 compared to the previous quarters, with Q1 and Q2 revenues of over 150 million yuan and 250 million yuan, respectively [3] Industry Competition - The video generation sector is experiencing intensified competition, particularly with the entry of Baidu and the launch of its free version of the Steam Engine model [3] - OpenAI's release of the Sora 2 model has also heightened market attention, prompting increased R&D efforts among various companies in the video generation space [3][4] - Kuaishou's CEO noted that the expansion of participants in the video generation field reflects its significant development potential and market value, although the technology is still in a developmental stage [4] Strategic Focus - Kuaishou's current strategy for Keling AI is to focus on the "AI film creation scene," while remaining adaptable to various application scenarios [6] - The company aims to enhance user experience and willingness to pay among professional creators, while exploring consumer applications as the market matures [6] - Kuaishou has increased its investment in computing power to meet the growing demand for video generation models, ensuring competitive technological capabilities [6]
博纳影业:公司积极关注国内外视频生成产品和相关技术发展
Zheng Quan Ri Bao Wang· 2025-10-16 09:45
Core Viewpoint - Bona Film Group (001330) is actively monitoring the development of video generation products and related technologies both domestically and internationally, and is exploring applications in these areas based on its business layout [1] Group 1 - The company will disclose relevant progress in accordance with regulations through designated media on the Shenzhen Stock Exchange [1] - Investors are encouraged to pay attention to the company's subsequent announcements and regular reports [1]
赛力斯取得一种视频生成相关专利
Jin Rong Jie· 2025-08-01 05:38
Core Insights - Chengdu Silis Technology Co., Ltd. has obtained a patent for a "video generation method, device, electronic equipment, and storage medium" with authorization announcement number CN119743660B, applied on March 2025 [1] Company Overview - Chengdu Silis Technology Co., Ltd. was established in 2021 and is located in Chengdu, primarily engaged in software and information technology services [1] - The company has a registered capital of 5 million RMB [1] - According to Tianyancha data analysis, the company has invested in one external enterprise and holds 324 patent records, in addition to one administrative license [1]
CVPR2025视频生成统一评估架构,上交x斯坦福联合提出让MLLM像人类一样打分
量子位· 2025-06-12 08:17
Core Viewpoint - Video generation technology is rapidly transforming visual content creation across various sectors, including film production, advertising design, virtual reality, and social media, making high-quality video generation models increasingly important [1]. Group 1: Video Evaluation Framework - The Video-Bench framework evaluates AI-generated videos by simulating human cognitive processes, establishing an intelligent assessment system that connects text instructions with visual content [2]. - Video-Bench enables multimodal large models (MLLM) to evaluate videos similarly to human assessments, effectively identifying defects in object consistency (0.735 correlation) and action rationality, while also addressing traditional challenges in aesthetic quality evaluation [3]. Group 2: Innovations in Video-Bench - Video-Bench addresses two main issues in existing video evaluation methods: the inability to capture complex dimensions like video fluency and aesthetic performance, and the challenges in cross-modal comparison during video-condition alignment assessments [5]. - The framework introduces two core innovations: a dual-dimensional evaluation framework covering video-condition alignment and video quality [7], and the implementation of chain-of-query and few-shot scoring techniques [8]. Group 3: Evaluation Dimensions - The dual-dimensional evaluation framework allows Video-Bench to assess video generation quality by breaking it down into "video-condition alignment" and "video quality," focusing on the accuracy of generated content against text prompts and the visual quality of the video itself [10]. - Key dimensions for video-condition consistency include object category consistency, action consistency, color consistency, scene consistency, and video-text consistency, while video quality evaluation emphasizes imaging quality, aesthetic quality, temporal consistency, and motion quality [10]. Group 4: Performance Comparison - Video-Bench significantly outperforms traditional evaluation methods, achieving an average Spearman correlation of 0.733 in video-condition alignment and 0.620 in video quality [18]. - In the critical metric of object category consistency, Video-Bench shows a 56.3% improvement over the GRiT-based method, reaching a correlation of 0.735 [19]. Group 5: Robustness and Reliability - Video-Bench's evaluation results were validated by a team of 10 experts who annotated 35,196 video samples, achieving a Krippendorff's α of 0.52, comparable to human self-assessment levels [21]. - The framework demonstrated high stability and reliability, with a TARA@3 score of 67% and a Krippendorff's α of 0.867, confirming the effectiveness of its component designs [23]. Group 6: Current Model Assessment - Video-Bench evaluated seven mainstream video generation models, revealing that commercial models generally outperform open-source models, with Gen3 scoring an average of 4.38 compared to VideoCrafter2's 3.87 [25]. - The assessment highlighted weaknesses in dynamic dimensions such as action rationality (average score of 2.53/3) and motion blur (3.11/5) across current models [26].