Workflow
Model 2.0视频生成系统
icon
Search documents
并行扩散架构突破极限,实现5分钟AI视频生成,「叫板」OpenAI与谷歌?
机器之心· 2025-11-20 09:35
Core Insights - CraftStory has launched the Model 2.0 video generation system, capable of producing expressive, human-centered videos up to five minutes long, addressing the long-standing "video duration" challenge in the AI video generation industry [1][3][5] Company Overview - CraftStory was founded by Victor Erukhimov, a key contributor to the widely used computer vision library OpenCV, and previously co-founded Itseez, which was acquired by Intel in 2016 [3][9] - The company aims to provide significant commercial value to businesses struggling to scale video production for training, marketing, and customer education [3][5] Technology and Innovation - The breakthrough in video duration is attributed to CraftStory's parallel diffusion architecture, which fundamentally differs from traditional models that require larger networks and more resources for longer videos [5][6] - CraftStory's system processes all segments of a five-minute video simultaneously, avoiding the accumulation of flaws that can occur when segments are generated sequentially [6][7] - The training data includes high-quality footage captured by professional studios, ensuring clarity even in fast-moving scenes, which contrasts with the motion blur often found in standard videos [6][7] Product Features - Model 2.0 is a "video-to-video" conversion model that allows users to upload their videos or use preset ones, maintaining character identity and emotional nuances over longer sequences [7][8] - The system can generate a 30-second low-resolution video in approximately 15 minutes, featuring advanced lip-syncing and gesture alignment algorithms [7][8] Market Position and Future Directions - CraftStory recently completed a $2 million funding round, which, while modest compared to larger competitors, reflects the company's belief that success does not solely depend on massive funding [9] - The company targets the B2B market, focusing on how software companies can create effective training and product videos, rather than consumer creative tools [9] - Future developments include a "text-to-video" model that will enable users to generate long-form content directly from scripts, as well as support for mobile camera scenes [9]