视频生成 - filings, earnings calls, financial reports, news - Reportify

视频生成

Search documents

画到哪，动到哪！字节跳动发布视频生成「神笔马良」ATI，已开源！

机器之心· 2025-07-02 10:40

Core Viewpoint - The article discusses the development of ATI, a new controllable video generation framework by ByteDance, which allows users to create dynamic videos by drawing trajectories on static images, transforming user input into explicit control signals for object and camera movements [2][4]. Group 1: Introduction to ATI - Angtian Wang, a researcher at ByteDance, focuses on video generation and 3D vision, highlighting the advancements in video generation tasks due to diffusion models and transformer architectures [1]. - The current mainstream methods face a significant bottleneck in providing effective and intuitive motion control for users, limiting creative expression and practical application [2]. Group 2: Methodology of ATI - ATI accepts two basic inputs: a static image and a set of user-drawn trajectories, which can be any shape, including lines and curves [6]. - The Gaussian Motion Injector encodes these trajectories into motion vectors in latent space, guiding the video generation process frame by frame [6][14]. - The model uses Gaussian weights to ensure that it can "see" the drawn trajectories and understand their relation to the generated video [8][14]. Group 3: Features and Capabilities - Users can draw trajectories for key actions like running or jumping, with ATI accurately sampling and encoding joint movements to generate natural motion sequences [19]. - ATI can handle up to 8 independent trajectories simultaneously, ensuring that object identities remain distinct during complex interactions [21]. - The system allows for synchronized camera movements, enabling users to create dynamic videos with cinematic techniques like panning and tilting [23][25]. Group 4: Performance and Applications - ATI demonstrates strong cross-domain generalization, supporting various artistic styles such as realistic films, cartoons, and watercolor renderings [28]. - Users can create non-realistic motion effects, such as flying or stretching, providing creative possibilities for sci-fi or fantasy scenes [29]. - The high-precision model based on Wan2.1-I2V-14B can generate videos comparable to real footage, while a lightweight version is available for real-time interactions in resource-constrained environments [30]. Group 5: Open Source and Community - The Wan2.1-I2V-14B model version of ATI has been open-sourced on Hugging Face, facilitating high-quality, controllable video generation for researchers and developers [32]. - Community support is growing, with tools like ComfyUI-WanVideoWrapper available to optimize model performance on consumer-grade GPUs [32].

Transformer架构

Transformer架构

免费约饭！加拿大ICML 2025，相聚机器之心人才晚宴

机器之心· 2025-07-01 09:34

Core Viewpoint - The AI field continues to develop rapidly in 2025, with significant breakthroughs in image and video generation technologies, particularly through diffusion models that enhance image synthesis quality and enable synchronized audio generation in video content [1][2]. Group 1: AI Technology Advancements - The use of diffusion models has led to unprecedented improvements in image synthesis quality, enhancing resolution, style control, and semantic understanding [2]. - Video generation technology has evolved, exemplified by Google's Veo 3, which achieves native audio synchronization, marking a significant advancement in video generation capabilities [2]. Group 2: Academic Collaboration and Events - The ICML conference, a leading academic event in the AI field, will take place from July 13 to July 19, 2025, in Vancouver, Canada, showcasing top research achievements [4]. - The "Yunfan・ICML 2025 AI Talent Meetup" is organized to facilitate informal discussions among professionals, focusing on cutting-edge technologies and talent dialogue [5][7]. Group 3: Event Details - The meetup will feature various engaging activities, including talks by young scholars, talent showcases, interactive experiences, institutional presentations, and networking dinners, aimed at fostering discussions on key issues in technology and application [7][8]. - The event is scheduled for July 15, 2025, from 16:00 to 20:30, with a capacity of 200 participants [8].

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

开源还要IPO？MiniMax不想被遗忘在这个夏天

3 6 Ke· 2025-06-20 04:44

Core Insights - The competition among the "Six Little Tigers" (MiniMax, Zhipu, Moonlight, Baichuan Intelligence, Zero One Everything, and Jiyue Star) is intensifying as they strive to prove their capabilities against DeepSeek, particularly in the development of reasoning models [1][3] - MiniMax has launched several new products, including the M1 reasoning model and the MiniMax Agent, as part of its strategy to remain competitive and relevant in the market [3][4] - The IPO ambitions of the "Six Little Tigers" are facing challenges due to revenue requirements and market conditions, with only Zhipu currently meeting the necessary financial criteria [9][11] Group 1: Product Development and Competition - Moonlight and Zhipu have released reasoning models that compete with DeepSeek's R1, with Moonlight's Kimi-Dev-72B model outperforming R1 in AI programming tests despite having significantly fewer parameters [1][3] - MiniMax's M1 model supports 1 million context inputs, which is eight times that of R1, marking a significant technological advancement [3] - MiniMax's recent product launches include the M1 model, video generation model Hailuo 02, and the MiniMax Agent, indicating a strategic shift towards diversifying its product offerings [4][5] Group 2: Market Position and IPO Aspirations - MiniMax's revenue has historically relied on its flagship product, Talkie, which has faced challenges, including a temporary removal from app stores [4][12] - The company is expanding its revenue streams by introducing new products like Hailuo AI and MiniMax Agent, targeting higher-paying overseas markets [12] - The IPO landscape for the "Six Little Tigers" is complicated, with only Zhipu having submitted its listing application, while MiniMax is still preparing its IPO materials amid challenging market conditions [9][10][13]

Artificial Intelligence

Artificial Intelligence

Midjourney正式推出V1视频模型

news flash· 2025-06-19 15:12

Midjourney推出视频生成模型V1，主打高性价比、易于上手的视频生成功能，作为其实现"实时模拟世界"愿景的第一步。用户现在可以通过动画化Midjourney图片或自己的图片来创作短视频，定位为有趣、易用、美观且价格亲民。入门价格：每月10美元即可使用。 ...

视频生成模型V1

视频生成模型V1

实测豆包1.6，最火玩法all in one！Seedance登顶视频生成榜一，豆包APP全量上线

量子位· 2025-06-12 07:11

Core Viewpoint - ByteDance's latest Doubao model 1.6 series has redefined the competitive landscape in the AI industry, achieving top-tier performance across various modalities and significantly enhancing its capabilities in reasoning, mathematics, and multimodal understanding [1][12][20]. Group 1: Model Performance and Achievements - Doubao model 1.6 has achieved scores above 700 in both science and liberal arts in the Haidian District's mock exam, with a notable increase of 154 points in science compared to the previous version [2][3]. - The Seedance 1.0 Pro model has topped global rankings in both text-to-video and image-to-video categories, showcasing its superior performance [4][5]. Group 2: Pricing and Cost Structure - The pricing model for Doubao 1.6 has been redefined, offering a unified pricing structure regardless of the task type, with costs based on input length [13][18]. - The cost for generating videos using Seedance 1.0 Pro is significantly low, at 0.015 yuan per thousand tokens, allowing for the generation of 2,700 videos for 10,000 yuan [11][12]. Group 3: Model Features and Capabilities - The Doubao model 1.6 series consists of three models: a comprehensive model, a deep thinking model, and a flash version, each designed for specific tasks and capabilities [23][24]. - The Seedance 1.0 Pro model features seamless multi-camera storytelling, stable motion, and realistic aesthetics, enhancing the video generation experience [38][49]. Group 4: Market Impact and Future Trends - The daily token usage for Doubao models has surged to over 16.4 trillion, marking a 137-fold increase since its launch [73]. - ByteDance's Volcano Engine holds a 46.4% market share in the public cloud model invocation, indicating its strong position in the industry [74]. - The transition from generative AI to agentic AI is highlighted as a key focus for future developments, emphasizing deep thinking, multimodal understanding, and autonomous tool invocation [79][80].

多模态理解

豆包大模型1.6系列

Seedance 1.0 Pro

多模态理解

豆包大模型1.6系列

Seedance 1.0 Pro

40秒生成1080P视频，3.6元一条，字节这次又要掀桌子了？藏师傅Seedance 1.0 Pro实测

歸藏的AI工具箱· 2025-06-11 08:42

朋友们好，我是歸藏（guizang）。今天上午的火山引擎Force原动力大会上字节发布了 Seedance 1.0 Pro 视频生成模型。也就是即梦里面的视频3.0 pro 模型。我也提前测试了一下，发现这次字节的视频模型真的站起来了。在图生和文生的提示词理解、画面细节、物理表现一致性理解等方面都无可挑剔，非常强悍，而且还是原生 1080P 分辨率。在 Artificial Analysis 上，Seedance 1.0 文生视频、图生视频的成绩都在第一，比 Veo 3 高了很多。 | | Text to Video | Image to Video | | | | | --- | --- | --- | --- | --- | --- | | Creator | Model | | Arena ELO | 95% CI | # Appearances | | ht ByteDance Seed | Seedance 1.0 | | 1299 | -13/+13 | 4,947 | | G Google | Veo 3 Preview | | 1252 | -10/+10 | 8,033 | | ...

Seedance 1.0 Pro 视频生成模型

Seedance 1.0 Pro 视频生成模型

聚焦多模态：ChatGPT时刻未到，2025大模型“变慢”了吗

Bei Jing Shang Bao· 2025-06-08 13:27

Core Insights - The emergence of multi-modal models, such as Emu3, signifies a shift in content generation, with the potential to understand and generate text, images, and videos through a single model [1][3] - The rapid development of AI has led to a competitive landscape where new and existing products coexist, but the core capabilities of video generation are still lagging behind expectations [1][5] - The commercial application of large models faces challenges, particularly in integrating visual generation with existing models, which limits scalability and effectiveness [7][8] Multi-Modal Model Development - Emu3, released by Zhiyuan Research Institute, is a native multi-modal model that incorporates various data types from the beginning of its training process, unlike traditional models that focus on language first [3][4] - The current learning path for multi-modal models often leads to a decline in performance as they transition from strong language capabilities to integrating other modalities [3][4] - The development of multi-modal models is still in its early stages, with significant technical challenges remaining, particularly in filtering effective information from diverse data types [3][4] Video Generation Challenges - Video generation technology is currently at a transitional phase, comparable to the evolution from GPT-2 to GPT-3, indicating that there is substantial room for improvement [5][6] - Key issues in video generation include narrative coherence, stability, and controllability, which are essential for producing high-quality content [6] - The industry is awaiting a breakthrough moment akin to the "ChatGPT moment" to enhance video generation capabilities [6] Commercialization and Market Growth - The multi-modal AI market is projected to reach $2.4 billion in 2024, with a compound annual growth rate (CAGR) exceeding 28%, and is expected to grow to $128 billion by 2025, reflecting a CAGR of 62.3% from 2023 to 2025 [8] - The integration of traditional computer vision models with large models is seen as a potential pathway for commercial applications, contingent on achieving a favorable cost-benefit ratio [7][8] - Companies are evolving their service models from providing platforms (PaaS) to offering tools (SaaS) and ultimately delivering direct results to users by 2025 [8]

多模态大模型

Artificial Intelligence

多模态大模型

Artificial Intelligence

爱诗科技CEO王长虎：视频是最贴近用户的内容形态，好的模型带来了好的产品

Hua Er Jie Jian Wen· 2025-06-06 13:20

Core Viewpoint - The 7th Beijing Zhiyuan Conference will be held on June 6-7, 2025, featuring a forum on large model industries with notable experts and CEOs, including a presentation by Wang Changhu, CEO of Aishi Technology, discussing the development of PixVerse and key decisions impacting its growth [1][3]. Group 1: Company Development - Aishi Technology's PixVerse has achieved significant global recognition, ranking among the top three image generation products alongside Keling and Hailuo, with over 16 million monthly active users as of early 2025 [4][10]. - The company was founded in April 2023, motivated by the emergence of a new era in AI, particularly after the launch of ChatGPT in late 2022 [5][6]. - The decision to focus on video generation, despite initial skepticism from investors, was based on the belief that it could match the commercialization potential of large language models [7][9]. Group 2: Key Strategic Decisions - The first critical decision was to pursue video generation, which was not favored by most investors at the time, as they believed it would not materialize within five years [6][7]. - The second decision revolved around whether to follow the trend set by Sora's emergence, which transformed video generation into a competitive field, leading to increased interest and investment in the sector [11][12]. - The third strategic decision involved targeting consumer (ToC) markets first before expanding to business (ToB) applications, aiming to empower ordinary users to create content easily [17][18]. Group 3: Product Success and Features - The launch of PixVerse's V3 version marked a significant turning point, with rapid user growth and engagement, attributed to its user-friendly features that lowered the creation barrier for ordinary users [13][18]. - The product's success was further enhanced by its ability to generate videos quickly, with V4 achieving near real-time generation capabilities and introducing sound to the videos [20][21]. - By May 2025, PixVerse had over 60 million users and ranked highly in app store charts, indicating strong market penetration and user engagement [22][23].

PixVerse（拍我AI）

PixVerse（拍我AI）

CVPR 2025 Tutorial：从视频生成到世界模型 | MMLab@NTU团队&快手可灵等联合呈现

量子位· 2025-06-05 08:32

Core Insights - Video generation technology has evolved from simple animations to high-quality dynamic content capable of storytelling and long-term reasoning [1] - The advancements in models like 可灵, Sora, Genie, Cosmos, and Movie Gen are expanding the boundaries of video generation, prompting researchers to explore deeper questions about its potential as a bridge to world models and its role in embodied intelligence [2][6] Group 1: Video Generation and Its Implications - Video generation is being recognized as a powerful visual prior that can enhance AI's perception of the world, understanding interactions, and reasoning about physics, leading towards more general and embodied intelligent world models [3] - The tutorial at CVPR 2025 will feature leading researchers from academia and industry discussing how generative capabilities can be transformed into a foundation for perception, prediction, and decision-making [4] Group 2: Tutorial Details - The CVPR 2025 tutorial is scheduled for June 11, 2025, at the Music City Center in Nashville, TN, focusing on the transition from video generation to understanding and modeling the real world [9] - The agenda includes various invited talks from experts in the field, covering topics such as scaling world models, physics-grounded models, and advancements in video generation [5] Group 3: Future Directions - The development of video generation models suggests potential for understanding interactions between objects and capturing the physical and semantic causality behind human behavior, indicating a shift from mere generation to interactive world modeling [6] - The tutorial aims to provide insights, tools, and future research directions for those interested in video generation, multimodal understanding, embodied AI, and physical reasoning [7]

Artificial Intelligence

Artificial Intelligence

本周日不见不散！CVPR 2025北京论文分享会最后报名了

机器之心· 2025-06-03 08:57

前几天，谷歌在 I/O 2025 大会上正式发布了其最新一代 AI 视频生成模型 Veo 3，在生成高质量视频的同时首次实现了音画同步。对于 Veo 3 的震撼效果，有人高度评价称，「它会是不亚于 OpenAI Sora 的跨时代产品」，标志着 AI 视频进入到了真正的「有声时代」。从中可以发现，虽然当前 AI 社区已有的大模型已经足够惊艳，但得益于架构的创新、算力集群的投入，仍然会「卷」出一些新东西来。比如视频生成领域，从最初的无声进化到如今的有声，提升明显；再比如多模态领域，逐渐朝着理解与生成大一统的方向演进。因此，为让从业者全面了解 AI 社区涌现的最新创新成果和发展趋势，机器之心计划 6 月 8 日在北京举办「CVPR 2025 论文分享会」，围绕着多模态、视频生成等热门主题邀请顶级专家、论文作者与现场参会观众共同交流。作为计算机视觉领域中最重要的国际会议之一，CVPR 具有极高的含金量，每年都会吸引大量研究机构和高校参会。今年，CVPR 2025 共收到 13008 份论文投稿，最终接收 2878 篇论文，整体接收率为 22.1%。作为一场为国内 AI 人才打造的盛会，本次论文分享会 ...

Artificial Intelligence

多模态大模型

Artificial Intelligence

Artificial Intelligence

多模态大模型

Artificial Intelligence