AI视频生成
Search documents
快手可灵也吃上了香蕉,一通离谱prompt测试,好好玩要爆了
量子位· 2025-12-02 09:32
梦瑶 发自 凹非寺 量子位 | 公众号 QbitAI ChatGPT发布三周年,OpenAI没发布,各大AI玩家倒纷纷整出大活。 这不,视频生成领域,快手放话可灵要"一周连续上新",而Day 1第一更,就甩出了可灵AI视频「 O1模型 」,"全球首个统一多模态视频模 型"。 把 视频修改 、 镜头延展 、 多主体参考 这些过去要在好几个模型间倒腾的活,全塞进了一个统一模型里,深层语义理解直接"一把梭"的那 种。 来了先吃碗面 。 这回我也让可灵O1上桌来一口——大口吃面+直视镜头,结果人物面部和周围场景都稳得住,小帅吃的那叫一个香啊: 整体实测下来,最直观的感受是:O1多主体元素的镜头切换里确实能稳住 一致性 , 局部编辑 也很自然,日常修瑕疵完全够用,还能生成 10s 长视频,对长视频创作者非常友好。 (前提是要氪金) 更多实测效果,我也先测为敬,你们要有更多奇思妙想,也欢迎评论区开麦~~~ 可灵AI视频「O1模型」一手实测 emm…怎么说呢?感觉是把NanoBanana的那些玩法做成了AI视频! 先来看这个,我随手把一张"兵马俑+粉饼"的照片扔给O1,结果它直接roll出一段"兵马俑补妆被领导抓现行"的视 ...
视频模型战火再燃!Runway超过谷歌登顶,可灵也来了
第一财经· 2025-12-02 09:09
Core Viewpoint - The competition in AI video generation is intensifying, with Runway's new model Gen-4.5 surpassing Google's Veo3 in benchmark tests, while domestic competitor Kuaishou's new model Keling O1 has also been launched, marking a significant moment in the industry [3][19]. Group 1: Model Performance - Runway's Gen-4.5 achieved a score of 1247 in the Artificial Analysis benchmark, making it the top model in text-to-video generation, followed closely by Google's Veo3 with a score of 1226 and Kuaishou's Keling 2.5 at 1225 [7][9]. - Gen-4.5 demonstrates advancements in understanding and executing complex sequential instructions, allowing users to specify detailed shot scheduling, scene composition, event timing, and subtle atmospheric changes [9][15]. Group 2: Technical Innovations - The model has made breakthroughs in pre-training data efficiency and post-training techniques, achieving unprecedented physical and visual accuracy in generated videos [9][15]. - Runway claims that objects in the generated videos move with realistic weight and dynamics, and liquid flows according to appropriate physical laws, enhancing the realism of the generated content [15][18]. Group 3: Market Position and Future Outlook - Runway, founded in 2018, has reached a valuation of $3.55 billion, with its first video model Gen-1 launched in February 2023, followed by Gen-2 in July, which integrated text-to-video and image-to-video functionalities [18]. - The competitive landscape is expected to become more challenging for Runway starting in 2024, with Google's Veo series solidifying its leading position and other competitors like Kuaishou and MiniMax gaining traction [19].
视频模型战火再燃!Runway超过谷歌登顶,可灵也来了
Di Yi Cai Jing Zi Xun· 2025-12-02 07:16
Core Insights - The competition in AI video generation has intensified with the recent launch of Runway's Gen-4.5 model, which has surpassed Google's Veo3 in benchmark tests [1][3] - Simultaneously, domestic competitor KuaLing AI announced the release of its new model, KuaLing O1, claiming to be the first unified multimodal video model [1][3] Benchmark Performance - Runway's Gen-4.5 achieved a score of 1247, ranking first in the Artificial Analysis leaderboard, followed closely by Google's Veo3 with a score of 1226 and KuaLing's model at 1225 [3][4] - The leaderboard indicates a tight competition, with only a one-point difference between Veo3 and KuaLing 2.5 [3][4] Model Features and Advancements - Gen-4.5 has made significant advancements in pre-training data efficiency and post-training techniques, excelling in understanding and executing complex sequential instructions [5][7] - The model demonstrates improved capabilities in adhering to precise prompts, realistic physical motion effects, style control, and visual consistency [5][7] Physical Realism and Limitations - Runway claims that Gen-4.5 achieves unprecedented physical and visual accuracy, with objects moving realistically and fluid dynamics rendered appropriately [7][11] - However, the model still faces challenges in causal reasoning and object permanence, with occasional discrepancies in the expected behavior of generated objects [11] Company Background and Market Position - Runway, founded in 2018, has reached a valuation of $3.55 billion as of 2023, showcasing rapid growth in the AI video generation sector [11] - The CEO of Runway highlighted the achievement of surpassing a trillion-dollar company with a team of just 100 people, emphasizing focus and hard work [11] Future Outlook - The AI video generation market is expected to become increasingly competitive, particularly with the anticipated release of Google's next-generation model, Veo4, in 2025 [12] - The sustainability of Gen-4.5's leading position is uncertain, especially with KuaLing O1 entering the market as a strong competitor [12]
千问App上线Wan 2.5和Qwen-Image:支持对口型、对话修图
Feng Huang Wang· 2025-12-02 06:42
Core Insights - Qianwen APP has launched two new models: Tongyi Wanxiang Wan 2.5 for video generation and Qwen-Image for image generation and editing, both available for unlimited free use [1] Group 1: Video Generation Model - Tongyi Wanxiang Wan 2.5 supports multi-language audio-visual synchronization, including Chinese, English, and dialects [1] - The model allows users to generate AI videos with text commands, featuring multi-person dialogues up to 10 seconds long [1] - Users can experience various popular features such as AI interviews and "全民舞王" (National Dance King) [1] Group 2: Image Generation and Editing Model - Qwen-Image enables precise editing and modification of text within images [1] - The model supports dual-image "collage" and "fusion," as well as editing based on reference images [1] - Qwen-Image has demonstrated advanced performance in multiple benchmark tests for general image generation and editing, showcasing its strong capabilities [1]
拍我AI(PixVerse)V5.5AI视频大模型上线 音画同步可一键生成
Huan Qiu Wang· 2025-12-02 05:51
Core Insights - Aishi Technology has officially launched PixVerse V5.5, also known as Pai Wo AI V5.5 in China, marking a significant evolution in AI video generation from "lens generation" to automatic "storytelling" with complete narrative capabilities [1] Group 1: Product Features - The new version supports simultaneous generation of audio and multi-shot video, enhancing the ability to synchronize multiple characters and scenes [3] - Users can input a brief prompt, and the AI will automatically generate a complete story segment, including shot progression, scene transitions, character dialogue, ambient sounds, and background music [3] - The AI's ability to understand narrative intent from prompts allows for natural camera movements and editing techniques, providing users with a director-like creative experience [3] Group 2: Market Impact - Feedback from the creator community indicates that the multi-shot capability of V5.5 is set to transform short video creation, eliminating the need for collaboration with photographers and editors for achieving effective opening sequences [4] - The model is the first in China to achieve "storyboard + sound" generation in a single process, allowing for high-quality video output without additional adjustments or audio uploads [3]
腾讯元宝上线AI视频生成能力
Guan Cha Zhe Wang· 2025-11-21 08:58
Core Insights - Tencent's HunyuanVideo 1.5, a lightweight video generation model based on the Diffusion Transformer architecture, has been officially released and open-sourced, featuring 8.3 billion parameters and the capability to generate 5-10 seconds of high-definition video [1][2] Group 1: Model Capabilities - HunyuanVideo 1.5 supports both Chinese and English input for text-to-video and image-to-video generation, showcasing high consistency between images and videos [2][3] - The model demonstrates strong instruction understanding and adherence, enabling it to execute diverse scenarios such as camera movements, smooth motion, realistic characters, and emotional expressions [2] - It supports various styles including realistic, animation, and block-based, and can generate text in both Chinese and English within the videos [2] Group 2: Video Quality - The model can natively generate 5-10 seconds of high-definition video at 480p and 720p, with the option to enhance quality to 1080p cinematic level through a super-resolution model [2] Group 3: Performance Comparison - In the T2V (Text-to-Video) task, HunyuanVideo outperformed several comparison models, with a winning margin of up to 17.12% against models like Wan2.2 [4] - In the I2V (Image-to-Video) task, HunyuanVideo also showed competitive results, achieving a winning margin of 12.65% against Wan2.2 [4]
元宝上线AI视频能力
Bei Ke Cai Jing· 2025-11-21 08:40
Core Insights - Yuanbao officially launched the "one-sentence video generation" capability, utilizing Tencent's latest open-source HunyuanVideo 1.5 model [4][6] - The new feature allows users to generate videos based on text prompts, marking a significant advancement in multimedia capabilities [5][7] Group 1: Technology and Features - The HunyuanVideo 1.5 model supports both Chinese and English text-to-video and image-to-video generation, ensuring high consistency in color tones and details [6] - The model operates efficiently with a lightweight size of 8.3 billion parameters, capable of running smoothly on consumer-grade graphics cards with 14GB of memory [6] Group 2: Market Positioning - The launch of this feature signifies Yuanbao's comprehensive coverage of multimedia formats, including text, images, audio, and video, enhancing its competitive edge in the market [7]
并行扩散架构突破极限,实现5分钟AI视频生成,「叫板」OpenAI与谷歌?
机器之心· 2025-11-20 09:35
Core Insights - CraftStory has launched the Model 2.0 video generation system, capable of producing expressive, human-centered videos up to five minutes long, addressing the long-standing "video duration" challenge in the AI video generation industry [1][3][5] Company Overview - CraftStory was founded by Victor Erukhimov, a key contributor to the widely used computer vision library OpenCV, and previously co-founded Itseez, which was acquired by Intel in 2016 [3][9] - The company aims to provide significant commercial value to businesses struggling to scale video production for training, marketing, and customer education [3][5] Technology and Innovation - The breakthrough in video duration is attributed to CraftStory's parallel diffusion architecture, which fundamentally differs from traditional models that require larger networks and more resources for longer videos [5][6] - CraftStory's system processes all segments of a five-minute video simultaneously, avoiding the accumulation of flaws that can occur when segments are generated sequentially [6][7] - The training data includes high-quality footage captured by professional studios, ensuring clarity even in fast-moving scenes, which contrasts with the motion blur often found in standard videos [6][7] Product Features - Model 2.0 is a "video-to-video" conversion model that allows users to upload their videos or use preset ones, maintaining character identity and emotional nuances over longer sequences [7][8] - The system can generate a 30-second low-resolution video in approximately 15 minutes, featuring advanced lip-syncing and gesture alignment algorithms [7][8] Market Position and Future Directions - CraftStory recently completed a $2 million funding round, which, while modest compared to larger competitors, reflects the company's belief that success does not solely depend on massive funding [9] - The company targets the B2B market, focusing on how software companies can create effective training and product videos, rather than consumer creative tools [9] - Future developments include a "text-to-video" model that will enable users to generate long-form content directly from scripts, as well as support for mobile camera scenes [9]
把龙做成菜,一个会计是怎么用AI做出740万播放的视频的?
后浪研究所· 2025-11-17 09:35
Core Viewpoint - The article discusses the viral success of an AI-generated video titled "Making Six Dishes from the Ancient Canglong," highlighting the innovative use of AI in content creation and the strategic approach taken by the creator to engage viewers and leverage trending topics [5][12][14]. Group 1: Video Content and Creation - The video achieved 7 million views within three days, showcasing a unique concept of cooking an extinct creature, the Canglong, which captivated audiences [5][11]. - The creator, known as "Huangpu River Salmon," utilized various popular memes and engaging storytelling techniques to maintain viewer interest throughout the 6-minute video [8][12]. - The production involved generating over 1,000 video clips, with a focus on achieving a high level of realism in AI-generated visuals, aiming for 90% authenticity [9][28]. Group 2: Strategic Approach and Audience Engagement - Prior to the viral video, the creator conducted A/B testing with three themed cooking videos to refine the formula for success, incorporating audience feedback and trending elements [12][18]. - The creator intentionally included "flaws" in the video to spark discussions among viewers, which in turn increased engagement and visibility on the platform [12][20]. - The acceptance of AI-generated content has significantly increased across major platforms, with many creators exploring AI tools to enhance their productions [12][40]. Group 3: Future Prospects and Industry Trends - The creator aims to transition into a full-time AI designer, reflecting a broader trend where AI is increasingly replacing traditional filming methods in content creation [13][40]. - The article suggests a promising future for AI-generated media, as brands and creators are willing to invest in AI capabilities to streamline production processes [40]. - The creator plans to explore more imaginative concepts in future videos, potentially featuring entirely fictional creatures, to maintain viewer interest and creativity [36][39].
把龙做成菜,一个会计是怎么用AI做出740万播放的视频的?
3 6 Ke· 2025-11-14 08:41
"一定要有梗才能留住人"。 10月下旬,一条名为《把远古沧龙做成六道菜(上)》的视频在B站爆火,上线三天播放量冲上700万。关键这是一段完全由AI生成的视频,时长6分23 秒,按以往规律,这两个buff叠在一起,是很难被流量眷顾的。 毕竟不少人对AI做的内容是"排斥的",但这条视频下的近5000评论中,多是对AI快速精进的画面质量与作者对AI掌控力的双重震惊。 这条片子确实也跟以往的多数AI视频不一样,它不切石头也不是小猫做饭,而是几个国家的厨师进行一场烹饪比赛,食材则为一条沧龙。是的,沧龙 ——一种6500万年前就已经灭绝的远古生物——把它做成菜,没见过吧。 开头是一群老外拿着锯锯肉和剁比人还高的排骨的宏大画面,镜头拉近、旋转、快速转换,人物出场冲突爆发,情节紧凑而有张力,一下便抓住了观众的 注意力。 ●《把远古沧龙做成六道菜(上)》视频开头 而要让人在这6分23秒中不流失,才是最难且重要的。为此,B站UP主"黄浦江三文鱼"(以下简称"三文鱼")上了很多"手段"—— 比如贯穿视频的各种热梗。首先登场的印度厨师做的"九转大肠";中国厨师是来自上海的"辛西娅",出场时自配背景音乐和解说,比如这句耳熟能详 的"最 ...