Workflow
AI Video Generation
icon
Search documents
好莱坞特效师花300多块钱,用AI做了一部科幻短片
Di Yi Cai Jing· 2025-08-21 12:57
参与过电影《2012》、《黑客帝国3》等好莱坞大片的视效指导姚骐今天公布了他用AI制作的科幻短片《归途》。 短片里,如同末日的世界里,巨大的异形生物追击驾驶汽车的人类、巨型蜘蛛爬在高楼等场景栩栩如生。姚骐评价"(效果)跟实拍差不 多。" 他向第一财经等记者透露,整部短片用了40多个镜头,每个镜头生成3次,共计120个视频片段,其中包括18个10秒一体化的有声片段和102个 五秒片段,最终花费约一周时间制作完成。 姚骐说,如果这部短片是一部纯实拍或者CG制作的片子,可能需要几百万的成本。在好莱坞做镜头,有些复杂的镜头仅一个就要几十万甚至 上百万。此外,实拍还受限于场景实现难度、危险性以及演员、剧组成本,而AI技术的介入为创意实现提供了全新可能。 几百万实拍成本的短片,如果用AI生成,花费是多少? 姚骐AI短片的合作对象、百度商业体系商业研发总经理刘林告诉记者,该片使用百度蒸汽机音视频一体模型,整体成本约在330.6元人民币。 但AI生成还有不少进步空间 目前,百度视频生成模型上线50天,最大的用户来自百度内部,包括搜索业务、移动生态创作者等,其次是专业领域创作者,以及企业客 户。 眼下视频生成赛道已足够卷。快手 ...
速递|Moonvalley发布首个公开数据训练的AI视频模型Marey:如何实现360度镜头控制与物理模拟
Z Potentials· 2025-07-09 05:56
Core Viewpoint - Moonvalley, an AI video generation startup, emphasizes that traditional text prompts are insufficient for film production, introducing a "3D perception" model that offers filmmakers greater control compared to standard text-to-video models [1] Group 1: Product Offering - Moonvalley launched its model Marey in March as a subscription service, allowing users to generate video clips up to 5 seconds long, with pricing tiers of $14.99 for 100 points, $34.99 for 250 points, and $149.99 for 1000 points [1] - Marey is one of the few models trained entirely on publicly licensed data, appealing to filmmakers concerned about potential copyright issues with AI-generated content [1] Group 2: Democratization of Filmmaking - Independent filmmaker Ángel Manuel Soto highlights Marey's ability to democratize access to top-tier AI narrative tools, reducing production costs by 20% to 40% and providing opportunities for those traditionally excluded from filmmaking [2] - Soto's experience illustrates how AI enables filmmakers to pursue their stories without needing external funding or approval [2] Group 3: Technological Capabilities - Marey possesses an understanding of the physical world, allowing for interactive storytelling and features like simulating motion while adhering to physical laws [3] - The model can transform scenes, such as converting a video of a bison running into a Cadillac speeding through the same environment, with realistic changes in grass and dust [4] Group 4: Advanced Features - Marey supports free camera movement, enabling users to adjust camera trajectories and create effects like panning and zooming with simple mouse actions [5] - Future updates are planned to include new control features such as lighting adjustments, depth object tracking, and a character library [5] - Marey's public release positions it in competition with other AI video generators like Runway Gen-3, Luma Dream Machine, Pika, and Haiper [5]
摩根士丹利:快手科技_人工智能视频生成热度攀升,Sedance 1.0 Pro 强劲首发为下一个驱动力
摩根· 2025-06-23 02:09
Investment Rating - The investment rating for Kuaishou Technology is Equal-weight [6] Core Insights - The competition in the AI video generation sector has intensified with the launch of ByteDance's Seedance 1.0 pro, which has achieved the top ranking in both text-to-video and image-to-video categories, outperforming competitors like Google's Veo 3.0 and Kuaishou's Kling 2.0 [2][3] - The pricing of Seedance 1.0 pro is competitive at Rmb3.67 for a 5-second video, which is 60-70% lower than similar market offerings, and it generates videos relatively quickly at approximately 40 seconds for a 5-second output [2][3] - The report suggests that while the recent releases from ByteDance and Minimax could significantly increase competition, it is premature to determine the long-term market leader in AI video generation [3] - Kuaishou's Kling model has shown strong financial performance year-to-date, which has positively influenced its share price, but there is a caution against overvaluing Kling before the competitive landscape stabilizes [3] Summary by Sections Industry Overview - The AI video generation market is experiencing heightened competition with new entrants and advancements in technology [1][3] Company Performance - Kuaishou Technology's Kling model is expected to exceed revenue guidance, reflecting strong market demand [4] - Financial projections for Kuaishou indicate a revenue increase from Rmb127 billion in 2024 to Rmb165 billion by 2027, with EBITDA growing from Rmb20 billion to Rmb37 billion in the same period [6] Valuation Metrics - The price target for Kuaishou Technology is set at HK$60.00, with a slight upside of 1% from the current price of HK$59.40 [6] - Key financial metrics include a projected P/E ratio of 11.2 for 2025 and an EV/EBITDA ratio of 7.1 for the same year [6]
ICML 2025 | 视频生成模型无损加速两倍,秘诀竟然是「抓住attention的时空稀疏性」
机器之心· 2025-05-07 07:37
Core Viewpoint - The article discusses the rapid advancement of AI video generation technology, particularly focusing on the introduction of Sparse VideoGen, which significantly accelerates video generation without compromising quality [1][4][23]. Group 1: Performance Bottlenecks in Video Generation - Current state-of-the-art video generation models like Wan 2.1 and HunyuanVideo face significant performance bottlenecks, requiring over 30 minutes to generate a 5-second 720p video on a single H100 GPU, with the 3D Full Attention module consuming over 80% of the inference time [1][6][23]. - The computational complexity of attention mechanisms in Video Diffusion Transformers (DiTs) increases quadratically with resolution and frame count, limiting real-world deployment capabilities [6][23]. Group 2: Introduction of Sparse VideoGen - Sparse VideoGen is a novel acceleration method that does not require retraining existing models, leveraging spatial and temporal sparsity in attention mechanisms to halve inference time while maintaining high pixel fidelity (PSNR = 29) [4][23]. - The method has been integrated with various state-of-the-art open-source models and supports both text-to-video (T2V) and image-to-video (I2V) tasks [4][23]. Group 3: Key Design Features of Sparse VideoGen - Sparse VideoGen identifies two unique sparsity patterns in attention maps: spatial sparsity, focusing on tokens within the same and adjacent frames, and temporal sparsity, capturing relationships across different frames [10][11][12]. - The method employs a dynamic adaptive sparse strategy through online profiling, allowing for optimal combinations of spatial and temporal heads based on varying denoising steps and prompts [16][17]. Group 4: Operator-Level Optimization - Sparse VideoGen introduces a hardware-friendly layout transformation to optimize memory access patterns, enhancing the performance of temporal heads by ensuring tokens are stored contiguously in memory [20][21]. - Additional optimizations for Query-Key Normalization (QK-Norm) and Rotary Position Embedding (RoPE) have resulted in significant throughput improvements, with average acceleration ratios of 7.4x and 14.5x, respectively [21]. Group 5: Experimental Results - Sparse VideoGen has demonstrated impressive performance, reducing inference time for HunyuanVideo from approximately 30 minutes to under 15 minutes, and for Wan 2.1 from 30 minutes to 20 minutes, while maintaining a PSNR above 29dB [23]. - The research indicates that understanding the internal structure of video generation models may lead to more sustainable performance breakthroughs compared to merely increasing model size [24].