Workflow
视频生成
icon
Search documents
爱诗科技CEO王长虎:视频是最贴近用户的内容形态,好的模型带来了好的产品
Hua Er Jie Jian Wen· 2025-06-06 13:20
Core Viewpoint - The 7th Beijing Zhiyuan Conference will be held on June 6-7, 2025, featuring a forum on large model industries with notable experts and CEOs, including a presentation by Wang Changhu, CEO of Aishi Technology, discussing the development of PixVerse and key decisions impacting its growth [1][3]. Group 1: Company Development - Aishi Technology's PixVerse has achieved significant global recognition, ranking among the top three image generation products alongside Keling and Hailuo, with over 16 million monthly active users as of early 2025 [4][10]. - The company was founded in April 2023, motivated by the emergence of a new era in AI, particularly after the launch of ChatGPT in late 2022 [5][6]. - The decision to focus on video generation, despite initial skepticism from investors, was based on the belief that it could match the commercialization potential of large language models [7][9]. Group 2: Key Strategic Decisions - The first critical decision was to pursue video generation, which was not favored by most investors at the time, as they believed it would not materialize within five years [6][7]. - The second decision revolved around whether to follow the trend set by Sora's emergence, which transformed video generation into a competitive field, leading to increased interest and investment in the sector [11][12]. - The third strategic decision involved targeting consumer (ToC) markets first before expanding to business (ToB) applications, aiming to empower ordinary users to create content easily [17][18]. Group 3: Product Success and Features - The launch of PixVerse's V3 version marked a significant turning point, with rapid user growth and engagement, attributed to its user-friendly features that lowered the creation barrier for ordinary users [13][18]. - The product's success was further enhanced by its ability to generate videos quickly, with V4 achieving near real-time generation capabilities and introducing sound to the videos [20][21]. - By May 2025, PixVerse had over 60 million users and ranked highly in app store charts, indicating strong market penetration and user engagement [22][23].
CVPR 2025 Tutorial:从视频生成到世界模型 | MMLab@NTU团队&快手可灵等联合呈现
量子位· 2025-06-05 08:32
Core Insights - Video generation technology has evolved from simple animations to high-quality dynamic content capable of storytelling and long-term reasoning [1] - The advancements in models like 可灵, Sora, Genie, Cosmos, and Movie Gen are expanding the boundaries of video generation, prompting researchers to explore deeper questions about its potential as a bridge to world models and its role in embodied intelligence [2][6] Group 1: Video Generation and Its Implications - Video generation is being recognized as a powerful visual prior that can enhance AI's perception of the world, understanding interactions, and reasoning about physics, leading towards more general and embodied intelligent world models [3] - The tutorial at CVPR 2025 will feature leading researchers from academia and industry discussing how generative capabilities can be transformed into a foundation for perception, prediction, and decision-making [4] Group 2: Tutorial Details - The CVPR 2025 tutorial is scheduled for June 11, 2025, at the Music City Center in Nashville, TN, focusing on the transition from video generation to understanding and modeling the real world [9] - The agenda includes various invited talks from experts in the field, covering topics such as scaling world models, physics-grounded models, and advancements in video generation [5] Group 3: Future Directions - The development of video generation models suggests potential for understanding interactions between objects and capturing the physical and semantic causality behind human behavior, indicating a shift from mere generation to interactive world modeling [6] - The tutorial aims to provide insights, tools, and future research directions for those interested in video generation, multimodal understanding, embodied AI, and physical reasoning [7]
本周日不见不散!CVPR 2025北京论文分享会最后报名了
机器之心· 2025-06-03 08:57
前几天,谷歌在 I/O 2025 大会上正式发布了其最新一代 AI 视频生成模型 Veo 3,在生成高质量视频的同时首次实现了音画同步。对于 Veo 3 的震撼效果,有人高 度评价称,「它会是不亚于 OpenAI Sora 的跨时代产品」,标志着 AI 视频进入到了真正的「有声时代」。 从中可以发现,虽然当前 AI 社区已有的大模型已经足够惊艳,但得益于架构的创新、算力集群的投入,仍然会「卷」出一些新东西来。比如视频生成领域,从最 初的无声进化到如今的有声,提升明显;再比如多模态领域,逐渐朝着理解与生成大一统的方向演进。 因此,为让从业者全面了解 AI 社区涌现的最新创新成果和发展趋势,机器之心计划 6 月 8 日在北京举办「CVPR 2025 论文分享会」,围绕着多模态、视频生成等 热门主题邀请顶级专家、论文作者与现场参会观众共同交流。 作为计算机视觉领域中最重要的国际会议之一,CVPR 具有极高的含金量,每年都会吸引大量研究机构和高校参会。今年,CVPR 2025 共收到 13008 份论文投 稿,最终接收 2878 篇论文,整体接收率为 22.1%。 作为一场为国内 AI 人才打造的盛会,本次论文分享会 ...
全日程公布|谷歌Veo 3惊艳发布后,这场CVPR分享会值得每个AI人「听个声」
机器之心· 2025-05-27 06:38
前几天,谷歌在 I/O 2025 大会上正式发布了其最新一代 AI 视频生成模型 Veo 3,在生成高质量视频的同时首次实现了音画同步。对于 Veo 3 的震撼效果,有人高 度评价称,「它会是不亚于 OpenAI Sora 的跨时代产品」,标志着 AI 视频进入到了真正的「有声时代」。 从中可以发现,虽然当前 AI 社区已有的大模型已经足够惊艳,但得益于架构的创新、算力集群的投入,仍然会「卷」出一些新东西来。比如视频生成领域,从最 初的无声进化到如今的有声,提升明显;再比如多模态领域,逐渐朝着理解与生成大一统的方向演进。 因此,为让从业者全面了解 AI 社区涌现的最新创新成果和发展趋势,机器之心计划 6 月 8 日在北京举办「CVPR 2025 论文分享会」,围绕着多模态、视频生成等 热门主题邀请顶级专家、论文作者与现场参会观众共同交流。 作为计算机视觉领域中最重要的国际会议之一,CVPR 具有极高的含金量,每年都会吸引大量研究机构和高校参会。今年,CVPR 2025 共收到 13008 份论文投 稿,最终接收 2878 篇论文,整体接收率为 22.1%。 作为一场为国内 AI 人才打造的盛会,本次论文分享会 ...
Veo3逼真脱口秀火爆全网,视频生成的GPT时刻到了吗?
Di Yi Cai Jing· 2025-05-26 03:02
Core Insights - The recent release of Google's video model Veo 3 has generated significant discussion, particularly due to its ability to create realistic characters and scenes, but users express that the technology is not as groundbreaking as some claims suggest [3][4][12] - Veo 3 introduces a native audio generation feature, allowing for simultaneous creation of sound effects and dialogue, marking a shift from previous silent video generation models [4][7] - Despite improvements, industry experts highlight that Veo 3 still has many flaws and is not yet suitable for large-scale commercial production [12][15][17] Group 1: Technology and Features - Veo 3's key innovation is its ability to generate audio alongside video, which enhances the overall production quality and efficiency [4][7] - The model allows for a streamlined workflow where text prompts can generate complete animated videos, including music and voice synchronization [7][15] - Users have reported that while the video quality has improved, it does not meet the high expectations set by earlier versions, and there are still issues with consistency and accuracy [12][14] Group 2: Market Reception and Cost - The cost of using Veo 3 is relatively high, requiring a subscription to Google's AI ultra plan at $249.99 per month, which is more expensive than competing services [16] - Users have noted that the points system for video generation can lead to additional costs, making it less feasible for commercial projects without purchasing extra credits [16][17] - Despite the high costs and existing flaws, some industry professionals see potential in Veo 3 and its associated tools like FLOW for future AI-driven video production workflows [17]
鹅厂开源视频生成大杀器!参考图主体精准复刻,还能编辑现有视频
量子位· 2025-05-09 07:03
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 人物部分,提示词如下: A woman takes a selfie in a busy city. A woman holds a smartphone in one hand and makes a peace sign with the other. The background is a bustling street scene with various signs and pedestrians. 刚刚,鹅厂开源"自定义"视频生成模型 HunyuanCustom 。 "自定义"主打的就是主体一致性,用一张图片就可以确定视频主角, 其一致性评分达到了开源模型SOTA ,且可和闭源媲美。 这样在构思提示词时,就可以不必纠结主体特征描述了。 HunyuanCustom一共支持单主体参考、多主体参考、局部编辑、角色配音四大功能。 其中 单主体参考已上线并开源,其余也将在本月内开源 。 此外混元的技术人员还在直播中透露,团队正在和开源社区合作, 将适配AI创作者常用的ComfyUI 。 期待所有功能完整上线的同时,不妨先来看看demo效果! 主体一致性 ...
昆仑万维:一季度营收大幅增长46% AI算力芯片取得突破性进展
Core Viewpoint - Kunlun Wanwei (300418.SZ) reported a significant revenue growth of 46% year-on-year in Q1 2025, driven by advancements in AI computing chips and applications [1] Group 1: Financial Performance - The company achieved an operating revenue of 1.76 billion yuan in Q1 2025, marking a 46% increase compared to the previous year [1] - R&D expenses reached 430 million yuan, reflecting a 23% year-on-year growth [1] - The annual recurring revenue (ARR) for AI music reached approximately 12 million USD, with a monthly revenue of about 1 million USD [1] - The ARR for the short drama platform Dramawave was approximately 120 million USD, with a monthly revenue of around 10 million USD [1] - Overseas business revenue amounted to 1.67 billion yuan, showing a 56% increase year-on-year, and accounted for 94% of total revenue [1] Group 2: Technological Advancements - The company launched several disruptive technologies in multi-modal reasoning, video generation, and audio generation, achieving state-of-the-art (SOTA) status in various models [2] - The Skywork R1V multi-modal reasoning model reached open-source SOTA, while the SkyReels-V1 model and SkyReels-A1 algorithm led the global video generation field [2] - In the AI music sector, the Mureka V6 and Mureka O1 models demonstrated a competitive edge, with Mureka O1 surpassing competitors in performance [2] Group 3: AI Chip Development - The company made significant progress in the R&D of AI computing chips, moving towards the goal of "Chinese chips, Kunlun manufacturing" [3] - Kunlun Wanwei acquired a controlling stake in Beijing Aijietek Technology Co., Ltd., completing a full industry chain layout from computing infrastructure to AI applications [3] - The R&D team for AI chips has expanded to nearly 200 employees, covering various fields such as chip design and algorithm development [3] Group 4: Future Prospects - The company plans to launch the Skywork.ai platform in mid-May 2025, which will feature a system of five expert-level AI agents for optimizing various professional tasks [3] - The Opera business segment, including overseas information distribution and metaverse operations, saw a revenue increase of 41% driven by Opera Ads [4] - The company aims to continue advancing AI computing chip development and innovate its AI application matrix to provide leading AI product experiences globally [4]