Workflow
视频生成技术
icon
Search documents
锦秋被投生数科技首席科学家朱军教授当选ACM Fellow|Jinqiu Spotlight
锦秋集· 2026-01-22 06:26
「Jinqiu Spotlight」 追踪锦秋基金与被投企业的每一个光点与动态,为创业者传递一线行业风向。 计算机界的"名人堂"再次迎来闪耀时刻。 今天,全球最具影响力计算学会ACM官宣: 2025 ACM Fellow名单,正式出炉! 清华大学计算机系教授、生数科技首席科学家 朱军当选。 作为机器学习领域的深耕者,朱军教授 在贝叶斯方法与表示学习等底层理论上贡献卓著, 凭借"稀疏主题编码"打 破传统表示学习的边界。 同时, 朱军教授也正在带领生数科技团队在多模态生成的"技术深水区"持续打磨。 锦秋基金是 生数科技天使+轮独家投资方。 依托于 2022 年全球首创的U-ViT 架构,生数科技率先发布了国内首 个视频生成大模型 Vidu。随着 2025 年视频生成技术全面渗入内容生产链路,进入 2026 年,行业的目光将从单 纯的'模型突破'转向更深层次的"场景扎根"。 在首期锦秋会@2025 Experience With AI大会上,朱军教授谈到 在进入视频生成领域时,格外敏感于"涌现"的价 值,并分享 多模态模型能力的背后,正在迎来一个"生产力跃迁"时代。 以下为新智元对 朱军、陈宝权、贾佳亚、熊辉等19 ...
通用级PixVerse R1的技术突破,揣着进入平行世界的密码
机器之心· 2026-01-15 09:17
Core Viewpoint - The article discusses the launch of PixVerse R1, a groundbreaking model in video generation that enables real-time, high-quality video creation, marking a significant advancement in the industry [1][3][38]. Group 1: Technological Breakthroughs - PixVerse R1 is the first global model to support real-time generation of 1080P resolution videos, transitioning video generation from static output to real-time interaction [6][35]. - The model achieves a significant increase in computational efficiency, allowing for real-time generation within the human perception range, thus representing a generational leap in application-level capabilities [3][6]. - The Instantaneous Response Engine (IRE) is introduced, which drastically reduces inference time by compressing the sampling steps from over 50 to just 1-4, addressing the computational load effectively [9][11]. Group 2: Model Architecture - The Omni model is a native end-to-end multimodal foundation that allows for the simultaneous processing of various data types, enhancing the model's versatility and efficiency [20][25]. - The model employs a unified token flow architecture based on Transformer, enabling the joint processing of text, images, audio, and video, thus improving the model's understanding of multimodal data [21][25]. - The model's native resolution feature ensures high-quality video generation without compromising the integrity of the visual content, addressing issues related to traditional data preprocessing methods [22][23]. Group 3: Continuous Evolution - PixVerse R1 introduces a self-regressive streaming generation mechanism that allows for theoretically infinite video generation, breaking the constraints of fixed-length outputs [29][32]. - The model incorporates a memory-enhanced attention module that captures and retains key features from the video, optimizing computational efficiency while maintaining long-term consistency [30][32]. - This architecture ensures that the generated content remains coherent and logically consistent, regardless of the length of the video, thus establishing a robust foundation for a universal real-time world model [32][38].
500万次围观,1X把「世界模型」真正用在了机器人NEO身上
机器之心· 2026-01-14 01:39
Core Viewpoint - The article discusses the advancements in the home humanoid robot NEO, particularly the introduction of its new brain, the 1X World Model, which enables NEO to learn and perform tasks more autonomously by understanding the physical world through video training [3][4][11]. Group 1: Technological Advancements - NEO has evolved from merely executing pre-programmed actions to being able to "imagine" tasks by generating a video of successful task completion in its mind before executing it [4][6]. - The 1X World Model (1XWM) integrates video pre-training to allow NEO to generalize across new objects, movements, and tasks without extensive prior training data [11][21]. - The model is built on a 14 billion parameter generative video model, which has undergone a multi-stage training process to adapt to NEO's physical characteristics [16][18]. Group 2: Training and Evaluation - The training process includes using 900 hours of first-person human video data to align the model with human-like operational behaviors, followed by fine-tuning with 70 hours of robot data [18][19]. - The evaluation of 1XWM's capabilities shows that it can perform tasks it has never encountered before, with generated videos closely matching real-world execution [24][30]. - The importance of high-quality subtitles and first-person data in improving video generation quality and task success rates is emphasized, indicating that detailed descriptions enhance the model's performance [39][40]. Group 3: Practical Applications - NEO has been tested on various tasks, including those requiring complex interactions and coordination, demonstrating its ability to adapt and learn from video pre-training [28][30]. - The model's performance in both in-distribution and out-of-distribution tasks shows a stable success rate, although some fine manipulation tasks remain challenging [30][32]. - The article suggests that the quality of generated videos can be linked to task success rates, allowing for potential improvements in video generation through iterative testing and selection processes [32][39].
AI漫剧产业前瞻:多模态技术突破与内容生产新范式
2025-12-11 02:16
AI 漫剧产业前瞻:多模态技术突破与内容生产新范式 20251210 摘要 巨量平台通过训练专属模型和要求用户提供多视图人物资产,结合自身 技术进行处理,以保持场景和人物的一致性,尽管市面上有类似功能, 但巨量平台在人物资产制作标准上进行了深入探索,从而实现高质量的 一致性效果。 为解决视频生成中的连贯性与一致性问题,巨量平台审核客户提供的人 物资产,确保符合标准,并通过精准服务和实时互动解决具体问题,同 时,通过培训和指导客户正确使用工具,使他们能够独立解决类似问题。 巨量平台对数据资产有明确标准,如要求提供大头照及三视图组合的人 物特写,并提供详细指导,协助客户优化数据资产,同时,通过深度交 流和共创,与国内一线模型厂商合作,不断推动行业标准化,提高整体 生产效率和效果。 目前视频生成技术中,人物、场景和物品的一致性对于画面还原最为重 要,高精度还原要求物体放置在正确位置且不能改变其本身特性,巨量 平台正在帮助模型厂商制定统一标准,而动作和运镜通过结合模型能力 与工程化工具可以很好地实现。 Q&A 巨量平台在图像和视频生成方面的技术基础是什么?是否基于 Stable Diffusion 进行二次开发? 我 ...
快手可灵AI全年预计收入1.4亿美元 创始人称视频生成技术远未成熟
Financial Performance - Kuaishou Technology reported total revenue of 35.6 billion yuan for Q3 2025, representing a year-on-year growth of 14.2% [2] - Adjusted net profit reached 5 billion yuan, with a year-on-year increase of 26.3%, indicating stable operational growth [2] - Revenue from online marketing services was 20.1 billion yuan, up 14% year-on-year; live streaming revenue was 9.6 billion yuan, growing 2.5% year-on-year; other services, including e-commerce and Keling AI, generated 5.9 billion yuan, marking a significant growth of 41.3% [2] Keling AI Performance - Keling AI's revenue for Q3 exceeded 300 million yuan, contributing to the overall revenue growth [2] - The CFO disclosed that Keling AI's full-year revenue is projected to reach 1.4 billion yuan, exceeding the initial target of 600 million yuan by over 100% [2] - Keling AI's revenue growth has slowed in Q3 compared to the previous quarters, with Q1 and Q2 revenues of over 150 million yuan and 250 million yuan, respectively [3] Industry Competition - The video generation sector is experiencing intensified competition, particularly with the entry of Baidu and the launch of its free version of the Steam Engine model [3] - OpenAI's release of the Sora 2 model has also heightened market attention, prompting increased R&D efforts among various companies in the video generation space [3][4] - Kuaishou's CEO noted that the expansion of participants in the video generation field reflects its significant development potential and market value, although the technology is still in a developmental stage [4] Strategic Focus - Kuaishou's current strategy for Keling AI is to focus on the "AI film creation scene," while remaining adaptable to various application scenarios [6] - The company aims to enhance user experience and willingness to pay among professional creators, while exploring consumer applications as the market matures [6] - Kuaishou has increased its investment in computing power to meet the growing demand for video generation models, ensuring competitive technological capabilities [6]
博纳影业:公司积极关注国内外视频生成产品和相关技术发展
Zheng Quan Ri Bao Wang· 2025-10-16 09:45
Core Viewpoint - Bona Film Group (001330) is actively monitoring the development of video generation products and related technologies both domestically and internationally, and is exploring applications in these areas based on its business layout [1] Group 1 - The company will disclose relevant progress in accordance with regulations through designated media on the Shenzhen Stock Exchange [1] - Investors are encouraged to pay attention to the company's subsequent announcements and regular reports [1]
赛力斯取得一种视频生成相关专利
Jin Rong Jie· 2025-08-01 05:38
Core Insights - Chengdu Silis Technology Co., Ltd. has obtained a patent for a "video generation method, device, electronic equipment, and storage medium" with authorization announcement number CN119743660B, applied on March 2025 [1] Company Overview - Chengdu Silis Technology Co., Ltd. was established in 2021 and is located in Chengdu, primarily engaged in software and information technology services [1] - The company has a registered capital of 5 million RMB [1] - According to Tianyancha data analysis, the company has invested in one external enterprise and holds 324 patent records, in addition to one administrative license [1]
CVPR2025视频生成统一评估架构,上交x斯坦福联合提出让MLLM像人类一样打分
量子位· 2025-06-12 08:17
Core Viewpoint - Video generation technology is rapidly transforming visual content creation across various sectors, including film production, advertising design, virtual reality, and social media, making high-quality video generation models increasingly important [1]. Group 1: Video Evaluation Framework - The Video-Bench framework evaluates AI-generated videos by simulating human cognitive processes, establishing an intelligent assessment system that connects text instructions with visual content [2]. - Video-Bench enables multimodal large models (MLLM) to evaluate videos similarly to human assessments, effectively identifying defects in object consistency (0.735 correlation) and action rationality, while also addressing traditional challenges in aesthetic quality evaluation [3]. Group 2: Innovations in Video-Bench - Video-Bench addresses two main issues in existing video evaluation methods: the inability to capture complex dimensions like video fluency and aesthetic performance, and the challenges in cross-modal comparison during video-condition alignment assessments [5]. - The framework introduces two core innovations: a dual-dimensional evaluation framework covering video-condition alignment and video quality [7], and the implementation of chain-of-query and few-shot scoring techniques [8]. Group 3: Evaluation Dimensions - The dual-dimensional evaluation framework allows Video-Bench to assess video generation quality by breaking it down into "video-condition alignment" and "video quality," focusing on the accuracy of generated content against text prompts and the visual quality of the video itself [10]. - Key dimensions for video-condition consistency include object category consistency, action consistency, color consistency, scene consistency, and video-text consistency, while video quality evaluation emphasizes imaging quality, aesthetic quality, temporal consistency, and motion quality [10]. Group 4: Performance Comparison - Video-Bench significantly outperforms traditional evaluation methods, achieving an average Spearman correlation of 0.733 in video-condition alignment and 0.620 in video quality [18]. - In the critical metric of object category consistency, Video-Bench shows a 56.3% improvement over the GRiT-based method, reaching a correlation of 0.735 [19]. Group 5: Robustness and Reliability - Video-Bench's evaluation results were validated by a team of 10 experts who annotated 35,196 video samples, achieving a Krippendorff's α of 0.52, comparable to human self-assessment levels [21]. - The framework demonstrated high stability and reliability, with a TARA@3 score of 67% and a Krippendorff's α of 0.867, confirming the effectiveness of its component designs [23]. Group 6: Current Model Assessment - Video-Bench evaluated seven mainstream video generation models, revealing that commercial models generally outperform open-source models, with Gen3 scoring an average of 4.38 compared to VideoCrafter2's 3.87 [25]. - The assessment highlighted weaknesses in dynamic dimensions such as action rationality (average score of 2.53/3) and motion blur (3.11/5) across current models [26].
CVPR2025视频生成统一评估架构,上交x斯坦福联合提出让MLLM像人类一样打分
量子位· 2025-06-12 08:16
Core Viewpoint - Video generation technology is rapidly transforming visual content creation across various sectors, emphasizing the importance of high-quality video generation models that align with human expectations [1]. Group 1: Video Evaluation Framework - The Video-Bench framework simulates human cognitive processes to establish an intelligent evaluation system that connects text instructions with visual content [2]. - Video-Bench enables multimodal large models (MLLM) to evaluate videos similarly to human assessments, identifying defects in object consistency (0.735 correlation) and action rationality, while also effectively assessing aesthetic quality [3][18]. Group 2: Innovations in Video Evaluation - Video-Bench addresses two main issues in existing video evaluation methods: the inability to capture complex dimensions like video fluency and aesthetics, and the challenges in cross-modal comparisons for video-text alignment [5]. - The framework introduces a dual-dimensional evaluation system covering video-condition alignment and video quality [7]. - Key technologies include Chain-of-Query, which resolves cross-modal alignment issues through iterative questioning, and Few-shot scoring, which quantifies subjective aesthetic judgments by comparing multiple videos [8][13]. Group 3: Comprehensive Evaluation Metrics - Video-Bench dissects video generation quality into two orthogonal dimensions: video-condition alignment and video quality, assessing both the fidelity to text prompts and the visual quality of the video itself [10]. - The evaluation framework includes metrics for object category consistency, action consistency, color consistency, scene consistency, imaging quality, aesthetic quality, temporal consistency, and motion quality [10][11]. Group 4: Performance Comparison - Video-Bench significantly outperforms traditional methods, achieving an average Spearman correlation of 0.733 in video-condition alignment and 0.620 in video quality [18]. - In the critical metric of object category consistency, Video-Bench shows a 56.3% improvement over GRiT methods, reaching a correlation of 0.735 [19]. - A reliability test with a panel of 10 experts on 35,196 video samples yielded a consistency score (Krippendorff's α) of 0.52, comparable to human self-assessment levels [21]. Group 5: Current Model Evaluations - Video-Bench evaluated seven mainstream video generation models, revealing that commercial models generally outperform open-source models, with Gen3 scoring an average of 4.38 compared to VideoCrafter2's 3.87 [25]. - The evaluation highlighted weaknesses in dynamic dimensions such as action rationality (average score of 2.53/3) and motion blur (3.11/5) [26]. - Comparisons among foundational models indicated that GPT-4o typically excels in video quality and consistency scores, particularly in imaging quality (0.807) and video-text consistency (0.750) [27].
豆包发布视频生成模型Seedance1.0 pro
news flash· 2025-06-11 03:38
Group 1 - The company Doubao has launched a video generation model called Seedance1.0pro, priced at 0.015 yuan per thousand tokens [1] - The cost to produce a 5-second 1080p video using this model is approximately 3.67 yuan per unit [1] - Additionally, Doubao has fully launched its real-time voice model [1]