视频生成
Search documents
Midjourney正式推出V1视频模型
news flash· 2025-06-19 15:12
Midjourney推出视频生成模型V1,主打高性价比、易于上手的视频生成功能,作为其实现"实时模拟世 界"愿景的第一步。用户现在可以通过动画化Midjourney图片或自己的图片来创作短视频,定位为有 趣、易用、美观且价格亲民。入门价格:每月10美元即可使用。 ...
实测豆包1.6,最火玩法all in one!Seedance登顶视频生成榜一,豆包APP全量上线
量子位· 2025-06-12 07:11
Core Viewpoint - ByteDance's latest Doubao model 1.6 series has redefined the competitive landscape in the AI industry, achieving top-tier performance across various modalities and significantly enhancing its capabilities in reasoning, mathematics, and multimodal understanding [1][12][20]. Group 1: Model Performance and Achievements - Doubao model 1.6 has achieved scores above 700 in both science and liberal arts in the Haidian District's mock exam, with a notable increase of 154 points in science compared to the previous version [2][3]. - The Seedance 1.0 Pro model has topped global rankings in both text-to-video and image-to-video categories, showcasing its superior performance [4][5]. Group 2: Pricing and Cost Structure - The pricing model for Doubao 1.6 has been redefined, offering a unified pricing structure regardless of the task type, with costs based on input length [13][18]. - The cost for generating videos using Seedance 1.0 Pro is significantly low, at 0.015 yuan per thousand tokens, allowing for the generation of 2,700 videos for 10,000 yuan [11][12]. Group 3: Model Features and Capabilities - The Doubao model 1.6 series consists of three models: a comprehensive model, a deep thinking model, and a flash version, each designed for specific tasks and capabilities [23][24]. - The Seedance 1.0 Pro model features seamless multi-camera storytelling, stable motion, and realistic aesthetics, enhancing the video generation experience [38][49]. Group 4: Market Impact and Future Trends - The daily token usage for Doubao models has surged to over 16.4 trillion, marking a 137-fold increase since its launch [73]. - ByteDance's Volcano Engine holds a 46.4% market share in the public cloud model invocation, indicating its strong position in the industry [74]. - The transition from generative AI to agentic AI is highlighted as a key focus for future developments, emphasizing deep thinking, multimodal understanding, and autonomous tool invocation [79][80].
40秒生成1080P视频,3.6元一条,字节这次又要掀桌子了?藏师傅Seedance 1.0 Pro实测
歸藏的AI工具箱· 2025-06-11 08:42
朋友们好,我是歸藏(guizang)。 今天上午的火山引擎Force原动力大会上字节发布了 Seedance 1.0 Pro 视频生成模型。 也就是 即梦里面的视频3.0 pro 模型。 我也提前测试了一下,发现这次字节的视频模型真的站起来了。 在图生和文生的提示词理解、画面细节、物理表现一致性理解等方面都无可挑剔,非常强悍,而且还是 原生 1080P 分辨率。 在 Artificial Analysis 上,Seedance 1.0 文生视频、图生视频的成绩都在第一,比 Veo 3 高了很多。 | | Text to Video | Image to Video | | | | | --- | --- | --- | --- | --- | --- | | Creator | Model | | Arena ELO | 95% CI | # Appearances | | ht ByteDance Seed | Seedance 1.0 | | 1299 | -13/+13 | 4,947 | | G Google | Veo 3 Preview | | 1252 | -10/+10 | 8,033 | | ...
聚焦多模态:ChatGPT时刻未到,2025大模型“变慢”了吗
Bei Jing Shang Bao· 2025-06-08 13:27
Core Insights - The emergence of multi-modal models, such as Emu3, signifies a shift in content generation, with the potential to understand and generate text, images, and videos through a single model [1][3] - The rapid development of AI has led to a competitive landscape where new and existing products coexist, but the core capabilities of video generation are still lagging behind expectations [1][5] - The commercial application of large models faces challenges, particularly in integrating visual generation with existing models, which limits scalability and effectiveness [7][8] Multi-Modal Model Development - Emu3, released by Zhiyuan Research Institute, is a native multi-modal model that incorporates various data types from the beginning of its training process, unlike traditional models that focus on language first [3][4] - The current learning path for multi-modal models often leads to a decline in performance as they transition from strong language capabilities to integrating other modalities [3][4] - The development of multi-modal models is still in its early stages, with significant technical challenges remaining, particularly in filtering effective information from diverse data types [3][4] Video Generation Challenges - Video generation technology is currently at a transitional phase, comparable to the evolution from GPT-2 to GPT-3, indicating that there is substantial room for improvement [5][6] - Key issues in video generation include narrative coherence, stability, and controllability, which are essential for producing high-quality content [6] - The industry is awaiting a breakthrough moment akin to the "ChatGPT moment" to enhance video generation capabilities [6] Commercialization and Market Growth - The multi-modal AI market is projected to reach $2.4 billion in 2024, with a compound annual growth rate (CAGR) exceeding 28%, and is expected to grow to $128 billion by 2025, reflecting a CAGR of 62.3% from 2023 to 2025 [8] - The integration of traditional computer vision models with large models is seen as a potential pathway for commercial applications, contingent on achieving a favorable cost-benefit ratio [7][8] - Companies are evolving their service models from providing platforms (PaaS) to offering tools (SaaS) and ultimately delivering direct results to users by 2025 [8]
爱诗科技CEO王长虎:视频是最贴近用户的内容形态,好的模型带来了好的产品
Hua Er Jie Jian Wen· 2025-06-06 13:20
Core Viewpoint - The 7th Beijing Zhiyuan Conference will be held on June 6-7, 2025, featuring a forum on large model industries with notable experts and CEOs, including a presentation by Wang Changhu, CEO of Aishi Technology, discussing the development of PixVerse and key decisions impacting its growth [1][3]. Group 1: Company Development - Aishi Technology's PixVerse has achieved significant global recognition, ranking among the top three image generation products alongside Keling and Hailuo, with over 16 million monthly active users as of early 2025 [4][10]. - The company was founded in April 2023, motivated by the emergence of a new era in AI, particularly after the launch of ChatGPT in late 2022 [5][6]. - The decision to focus on video generation, despite initial skepticism from investors, was based on the belief that it could match the commercialization potential of large language models [7][9]. Group 2: Key Strategic Decisions - The first critical decision was to pursue video generation, which was not favored by most investors at the time, as they believed it would not materialize within five years [6][7]. - The second decision revolved around whether to follow the trend set by Sora's emergence, which transformed video generation into a competitive field, leading to increased interest and investment in the sector [11][12]. - The third strategic decision involved targeting consumer (ToC) markets first before expanding to business (ToB) applications, aiming to empower ordinary users to create content easily [17][18]. Group 3: Product Success and Features - The launch of PixVerse's V3 version marked a significant turning point, with rapid user growth and engagement, attributed to its user-friendly features that lowered the creation barrier for ordinary users [13][18]. - The product's success was further enhanced by its ability to generate videos quickly, with V4 achieving near real-time generation capabilities and introducing sound to the videos [20][21]. - By May 2025, PixVerse had over 60 million users and ranked highly in app store charts, indicating strong market penetration and user engagement [22][23].
CVPR 2025 Tutorial:从视频生成到世界模型 | MMLab@NTU团队&快手可灵等联合呈现
量子位· 2025-06-05 08:32
Core Insights - Video generation technology has evolved from simple animations to high-quality dynamic content capable of storytelling and long-term reasoning [1] - The advancements in models like 可灵, Sora, Genie, Cosmos, and Movie Gen are expanding the boundaries of video generation, prompting researchers to explore deeper questions about its potential as a bridge to world models and its role in embodied intelligence [2][6] Group 1: Video Generation and Its Implications - Video generation is being recognized as a powerful visual prior that can enhance AI's perception of the world, understanding interactions, and reasoning about physics, leading towards more general and embodied intelligent world models [3] - The tutorial at CVPR 2025 will feature leading researchers from academia and industry discussing how generative capabilities can be transformed into a foundation for perception, prediction, and decision-making [4] Group 2: Tutorial Details - The CVPR 2025 tutorial is scheduled for June 11, 2025, at the Music City Center in Nashville, TN, focusing on the transition from video generation to understanding and modeling the real world [9] - The agenda includes various invited talks from experts in the field, covering topics such as scaling world models, physics-grounded models, and advancements in video generation [5] Group 3: Future Directions - The development of video generation models suggests potential for understanding interactions between objects and capturing the physical and semantic causality behind human behavior, indicating a shift from mere generation to interactive world modeling [6] - The tutorial aims to provide insights, tools, and future research directions for those interested in video generation, multimodal understanding, embodied AI, and physical reasoning [7]
本周日不见不散!CVPR 2025北京论文分享会最后报名了
机器之心· 2025-06-03 08:57
前几天,谷歌在 I/O 2025 大会上正式发布了其最新一代 AI 视频生成模型 Veo 3,在生成高质量视频的同时首次实现了音画同步。对于 Veo 3 的震撼效果,有人高 度评价称,「它会是不亚于 OpenAI Sora 的跨时代产品」,标志着 AI 视频进入到了真正的「有声时代」。 从中可以发现,虽然当前 AI 社区已有的大模型已经足够惊艳,但得益于架构的创新、算力集群的投入,仍然会「卷」出一些新东西来。比如视频生成领域,从最 初的无声进化到如今的有声,提升明显;再比如多模态领域,逐渐朝着理解与生成大一统的方向演进。 因此,为让从业者全面了解 AI 社区涌现的最新创新成果和发展趋势,机器之心计划 6 月 8 日在北京举办「CVPR 2025 论文分享会」,围绕着多模态、视频生成等 热门主题邀请顶级专家、论文作者与现场参会观众共同交流。 作为计算机视觉领域中最重要的国际会议之一,CVPR 具有极高的含金量,每年都会吸引大量研究机构和高校参会。今年,CVPR 2025 共收到 13008 份论文投 稿,最终接收 2878 篇论文,整体接收率为 22.1%。 作为一场为国内 AI 人才打造的盛会,本次论文分享会 ...
全日程公布|谷歌Veo 3惊艳发布后,这场CVPR分享会值得每个AI人「听个声」
机器之心· 2025-05-27 06:38
前几天,谷歌在 I/O 2025 大会上正式发布了其最新一代 AI 视频生成模型 Veo 3,在生成高质量视频的同时首次实现了音画同步。对于 Veo 3 的震撼效果,有人高 度评价称,「它会是不亚于 OpenAI Sora 的跨时代产品」,标志着 AI 视频进入到了真正的「有声时代」。 从中可以发现,虽然当前 AI 社区已有的大模型已经足够惊艳,但得益于架构的创新、算力集群的投入,仍然会「卷」出一些新东西来。比如视频生成领域,从最 初的无声进化到如今的有声,提升明显;再比如多模态领域,逐渐朝着理解与生成大一统的方向演进。 因此,为让从业者全面了解 AI 社区涌现的最新创新成果和发展趋势,机器之心计划 6 月 8 日在北京举办「CVPR 2025 论文分享会」,围绕着多模态、视频生成等 热门主题邀请顶级专家、论文作者与现场参会观众共同交流。 作为计算机视觉领域中最重要的国际会议之一,CVPR 具有极高的含金量,每年都会吸引大量研究机构和高校参会。今年,CVPR 2025 共收到 13008 份论文投 稿,最终接收 2878 篇论文,整体接收率为 22.1%。 作为一场为国内 AI 人才打造的盛会,本次论文分享会 ...
Veo3逼真脱口秀火爆全网,视频生成的GPT时刻到了吗?
Di Yi Cai Jing· 2025-05-26 03:02
Core Insights - The recent release of Google's video model Veo 3 has generated significant discussion, particularly due to its ability to create realistic characters and scenes, but users express that the technology is not as groundbreaking as some claims suggest [3][4][12] - Veo 3 introduces a native audio generation feature, allowing for simultaneous creation of sound effects and dialogue, marking a shift from previous silent video generation models [4][7] - Despite improvements, industry experts highlight that Veo 3 still has many flaws and is not yet suitable for large-scale commercial production [12][15][17] Group 1: Technology and Features - Veo 3's key innovation is its ability to generate audio alongside video, which enhances the overall production quality and efficiency [4][7] - The model allows for a streamlined workflow where text prompts can generate complete animated videos, including music and voice synchronization [7][15] - Users have reported that while the video quality has improved, it does not meet the high expectations set by earlier versions, and there are still issues with consistency and accuracy [12][14] Group 2: Market Reception and Cost - The cost of using Veo 3 is relatively high, requiring a subscription to Google's AI ultra plan at $249.99 per month, which is more expensive than competing services [16] - Users have noted that the points system for video generation can lead to additional costs, making it less feasible for commercial projects without purchasing extra credits [16][17] - Despite the high costs and existing flaws, some industry professionals see potential in Veo 3 and its associated tools like FLOW for future AI-driven video production workflows [17]
鹅厂开源视频生成大杀器!参考图主体精准复刻,还能编辑现有视频
量子位· 2025-05-09 07:03
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 人物部分,提示词如下: A woman takes a selfie in a busy city. A woman holds a smartphone in one hand and makes a peace sign with the other. The background is a bustling street scene with various signs and pedestrians. 刚刚,鹅厂开源"自定义"视频生成模型 HunyuanCustom 。 "自定义"主打的就是主体一致性,用一张图片就可以确定视频主角, 其一致性评分达到了开源模型SOTA ,且可和闭源媲美。 这样在构思提示词时,就可以不必纠结主体特征描述了。 HunyuanCustom一共支持单主体参考、多主体参考、局部编辑、角色配音四大功能。 其中 单主体参考已上线并开源,其余也将在本月内开源 。 此外混元的技术人员还在直播中透露,团队正在和开源社区合作, 将适配AI创作者常用的ComfyUI 。 期待所有功能完整上线的同时,不妨先来看看demo效果! 主体一致性 ...