Video Generation

Search documents
Video Generation as a Primitive
Y Combinator· 2025-07-31 02:38
Video generation models are getting really good. Google's V3 already produces 8-second photorealistic soundon clips for just a few dollars per video, and they're often indistinguishable from reality. Soon, you'll be able to create nearperfect footage of anything on the fly for a marginal cost approaching zero, and video will become a new basic building block for software.When this happens, a lot of new ideas become possible. It's definitely going to change media and entertainment. Imagine being able to crea ...
I Used Google's Veo 3 AI Video Generator to Produce Parts of This Video
CNET· 2025-07-03 12:00
Veo 3 Features & Functionality - Veo 3 is Google's third-generation generative AI model for video, incorporating dialogue, audio, and sound effects [1] - Access to Veo 3 is provided through a Google AI Ultra subscription, which includes a suite of Google services [1] - Video generation within the Gemini interface takes approximately 3 to 5 minutes, which is longer compared to OpenAI's Sora [1] - Generated videos are encoded as 16x9 videos with a resolution of 1280x720, an 8-second length, and a file size under 2 megabytes [1] - The video player interface includes standard controls such as play/pause, mute (without volume control), and a download option, along with a Veo watermark [1] Prompting & Customization - To enable voiceovers, users must prompt Gemini to use a synchronized lip-sync voiceover and provide a script [1] - Cinematic control is available, allowing users to specify shot types such as medium close-ups [1] - Veo 3 may introduce unexpected elements such as accents, different ethnicities, or additional objects not specified in the prompt [1] Google Flow & Integration - Google Flow, available in Google Labs, offers more advanced tools for video creatives, including story narrative building and visual filing systems [1] - Flow allows users to select different Veo models and the number of outputs per prompt [1] - Veo 3 is being integrated into other Google products like Google's Cloud Vertex AI for developers, and third-party platforms are utilizing the API [1]
X @Demis Hassabis
Demis Hassabis· 2025-07-03 03:26
Product Launch - GeminiApp's Veo 3 is now globally available for Pro members [1] - Pro members receive 3 video generations per day, with daily credit replenishment [1] Geographic Expansion - Veo 3 access is expanding to India, Indonesia, and all of Europe [1]
New #1 AI Video Model is HERE (Beats VEO 3)
Matthew Berman· 2025-06-28 15:01
Model Comparison - Seed Dance 1.0% surpasses VO3 by 50 ELO points on the Hugging Face leaderboards [1] - Seed Dance excels in generating realistic street interview videos, deemed nearly flawless and indistinguishable from reality by most viewers [7] - Seed Dance demonstrates impressive detail in specific scenarios, such as accurately rendering the moon's reflection and features in a Pixar-style animation [25][26] - VO3 exhibits superior physics in certain scenarios, contributing to more realistic movements [69] - VO3 excels in generating video game scenes, showcasing elements like health bars and puzzle-solving [37] - VO3 produces more realistic and terrifying horror scenes with better lighting and quicker movements [41] - VO3 generates more detailed and physically accurate fantastical scenes, such as the Leviathan over a crystalline forest [62] Model Limitations - Seed Dance struggles with simulating realistic physics, often resulting in slow or unnatural movements [40][62] - Seed Dance exhibits inconsistencies in character and theme, often displaying a distinct Chinese vibe due to its training data [4] - Both models struggle with complex scenarios like train crashes, Rubik's Cube solving, and specific instructions in prompts [14][20][32] - Both models have limitations in consistently maintaining scene details and avoiding morphing or unrealistic transitions [29][45] Cost and Usage - Using Seed Dance 1.0% Pro costs approximately $100 for 16,000 credits, with each 10-second, 1080p generation costing 800 credits [10]
数据减少超千倍,500 美金就可训练一流视频模型,港城、华为Pusa来了
机器之心· 2025-06-19 02:28
Core Viewpoint - The article discusses the revolutionary advancements in video generation through the introduction of the Frame-aware Video Diffusion Model (FVDM) and its practical application in the Pusa project, which significantly reduces training costs and enhances video generation capabilities [2][3][37]. Group 1: FVDM and Pusa Project - FVDM introduces a vectorized timestep variable (VTV) that allows each frame to have an independent temporal evolution path, addressing the limitations of traditional scalar timesteps in video generation [2][18]. - The Pusa project, developed in collaboration with Huawei's Hong Kong Research Institute, serves as a direct application and validation of FVDM, exploring a low-cost method for fine-tuning large-scale pre-trained video models [3][37]. - Pusa achieves superior results compared to the official Wan I2V model while reducing training costs by over 200 times (from at least $100,000 to $500) and data requirements by over 2500 times [5][37]. Group 2: Technical Innovations - The Pusa project utilizes non-destructive fine-tuning on pre-trained models like Wan-T2V 14B, allowing for effective video generation without compromising the original model's capabilities [5][29]. - The introduction of a probabilistic timestep sampling training strategy (PTSS) in FVDM enhances convergence speed and improves performance compared to the original model [30][31]. - Pusa's VTV mechanism enables diverse video generation tasks by allowing different frames to have distinct noise perturbation controls, thus facilitating more nuanced video generation [35][36]. Group 3: Community Engagement and Future Prospects - The complete codebase, training datasets, and training code for Pusa have been open-sourced to encourage community contributions and collaboration, aiming to enhance performance and explore new possibilities in video generation [17][37]. - The article emphasizes the potential of Pusa to lead the video generation field into a new era characterized by low costs and high flexibility [36][37].
字节 AI 卷出新高度:豆包试水“上下文定价”,Trae 覆盖内部80%工程师,战略瞄定三主线
AI前线· 2025-06-11 08:39
Core Insights - ByteDance shared its thoughts on the main lines of AI technology development for this year, focusing on three key areas [1] - On June 11, ByteDance's Volcano Engine launched a series of updates, including the Doubao model 1.6 and the Seedance 1.0 Pro video generation model [1] Doubao Model 1.6 - The Doubao model 1.6 includes several variants that support multimodal input and achieve a context length of 256K [3] - The model demonstrated strong performance in exams, scoring 144 in a national math exam and 706 in science and 712 in humanities in a simulation test [3] - Doubao 1.6 can perform tasks such as hotel booking and organizing shopping receipts into Excel [3] Pricing and Cost Structure - Doubao 1.6 has a unified pricing structure based on context length, with costs significantly lower than previous models [8] - Pricing details include: - 1-32k context length: input at 0.8 RMB/million tokens, output at 8 RMB/million tokens - 32-128k context length: input at 1.2 RMB/million tokens, output at 16 RMB/million tokens - 128-256k context length: input at 2.4 RMB/million tokens, output at 24 RMB/million tokens [9] Video Generation Technology - The Seedance 1.0 Pro model features seamless multi-shot storytelling and enhanced motion realism, allowing for the generation of complex video content [18] - The cost for generating a 5-second 1080P video is approximately 3.67 RMB, making it competitive in the market [18][20] AI Development Tools - Trae, an internal coding assistant, has gained significant traction, with over 80% of ByteDance engineers using it [14] - Trae enhances coding efficiency through features like code completion and predictive editing, allowing for rapid development [16] - The development of Trae is based on the Doubao 1.6 model, which has been specifically trained for engineering tasks [16] Future Trends in AI - The industry is expected to see gradual improvements in handling complex multi-step tasks, with a projected accuracy of 80%-90% for simple tasks by Q4 of this year [5] - ByteDance anticipates that video generation technology will become more practical for production by 2025, with models like Veo 2 emerging [5] - The company is focusing on integrating AI into various sectors, including e-commerce and gaming, to enhance user experiences [22]
Veo 3 demo | Crystailine flowers bloom
Google DeepMind· 2025-05-20 23:00
Model Capabilities - Veo 3 is a new state-of-the-art video generation model designed for filmmakers and storytellers [1] - The model empowers users to add sound effects, ambient noise, and dialogue, generating all audio natively [1] - Veo 3 delivers best-in-class quality, excelling in physics, realism, and prompt adherence [1] Key Features - Veo 3 allows for the creation of videos from text prompts, such as "A snow-covered plain of iridescent moon-dust under twilight skies" [1] - The model can generate visuals of complex scenes, including "Thirty-foot crystalline flowers bloom, refracting light into slow-moving rainbows" [1] - Veo 3 can depict figures interacting with the environment, such as "A fur-cloaked figure walks between these colossal blossoms, leaving the only footprints in untouched dust" [1]