Pika

Search documents
6秒造一个「视频博主」,Pika让一切图片开口说话
机器之心· 2025-08-13 03:27
还记得 veo3 发布时引起的轰动吗?「音画同步」功能的革命性直接把其他视频生成模型按在地上摩擦,拍 摄 + 配音 + 粗剪一键搞定。 那如果我就是想用自己迷人的声音呢?或者我自带精妙绝伦的配音?有没有其他解决方案? 机器之心报道 编辑:+0 制作一个视频需要几步?可以简单概括为:拍摄 + 配音 + 剪辑。 有的朋友,有的! Pika 允许用户上传音频文件(如语音、音乐、说唱或任何声音片段),并结合静态图像(如自拍或任意图 片)生成高度同步的视频。视频中的角色会自动匹配音频,实现精确的口型同步(lip sync)、自然的表情 变化和流畅的身体动作。 更通俗一点说就是, 让任何一张静态图片,跟着你给的音频动起来 ,而且是活灵活现的那种。 你随便扔给它一张自拍,再配上一段马保国的「年轻人不讲武德」,你照片里那张帅气的脸,马上就能口 型神同步,连眉毛挑动的时机都分毫不差,主打一个「本人亲授」。 这事儿要是放以前,你起码得是个顶级特效师,捣鼓个十天半个月才能弄出来。现在,Pika 告诉你, 平均 只要 6 秒 。 8 月 11 日,Pika 推出了一个名为「 音频驱动表演模型 」(Audio-Driven Perfo ...
AI改变了一切,除了猫咪
Hu Xiu· 2025-06-30 03:25
Core Insights - The article discusses the rising popularity of AI-generated cat videos, particularly focusing on the "AI cat" phenomenon that combines humor and technology to engage audiences [19][20][29]. Group 1: AI Cat Video Trends - AI cat videos are gaining traction on platforms like TikTok and YouTube, with channels experiencing significant growth in followers and views after switching to AI-generated content [11][13]. - For instance, a YouTube channel named Batysyr gained 770,000 followers and 100 million views in a month by posting 20 AI cat videos [11]. - Another channel, Cat channel 91, saw its subscriber count increase by 2 million after transitioning to AI cat videos, with views jumping from tens of thousands to millions [11]. Group 2: Monetization Strategies - Creators are monetizing AI cat content through various methods, including ad placements in videos and charging for video production services [14][15]. - A creator named Ansheng reported earning around 20,000 RMB monthly from multiple AI cat accounts, with TikTok videos generating 1,200 to 2,000 RMB per million views [14]. - The trend has led to the emergence of low-quality, algorithm-driven content, referred to as "AI Slop," which aims to exploit viewer engagement for profit [16]. Group 3: Technological and Cultural Factors - The success of AI cat videos is attributed to a combination of advanced AI technology and cultural factors, creating a "perfect chemical reaction" [19][20]. - The current AI technology allows for realistic simulations of physical actions, making the videos more engaging and shareable [20][23]. - The low production cost of these videos, often just a few dozen RMB, has lowered the barrier for entry, enabling more creators to participate [23]. Group 4: Psychological Appeal of Cats - Cats have been chosen as the primary subject for these videos due to their inherent appeal, which triggers human emotions and empathy [26][29]. - The concept of "neoteny" suggests that cats' features resemble those of infants, making them universally appealing [26]. - Using cats helps avoid the "uncanny valley" effect associated with AI-generated human faces, allowing for broader acceptance of AI content [26]. Group 5: Future Implications - The popularity of AI cat videos signals a shift in how advanced technology can resonate with human emotions, indicating a potential pathway for AI to integrate into everyday life [29][30]. - The phenomenon serves as a social experiment, preparing audiences for a future where AI-generated content becomes commonplace [30][31].
AI生成视频总不符合物理规律?匹兹堡大学团队新作PhyT2V:不重训练模型也能让物理真实度狂飙2.3倍!
机器之心· 2025-05-19 04:03
Core Viewpoint - The article discusses the advancement of Text-to-Video (T2V) generation technology, emphasizing the transition from focusing on visual quality to ensuring physical consistency and realism through the introduction of the PhyT2V framework, which enhances existing T2V models without requiring retraining or extensive external data [2][3][26]. Summary by Sections Introduction to PhyT2V - PhyT2V is a framework developed by a research team at the University of Pittsburgh, aimed at improving the physical consistency of T2V generation by integrating large language models (LLMs) for iterative self-refinement [2][3][8]. Current State of T2V Technology - Recent T2V models, such as Sora, Pika, and CogVideoX, have shown significant progress in generating complex and realistic scenes, but they struggle with adhering to real-world physical rules and common sense [5][7]. Limitations of Existing Methods - Current methods for enhancing T2V models often rely on data-driven approaches or fixed physical categories, which limits their generalizability, especially in out-of-distribution scenarios [10][12][18]. PhyT2V Methodology - PhyT2V employs a three-step iterative process involving: 1. Identifying physical rules and main objects from user prompts [12]. 2. Detecting semantic mismatches between generated videos and prompts using video captioning models [13]. 3. Generating corrected prompts based on identified physical rules and mismatches [14] [18]. Advantages of PhyT2V - PhyT2V offers several advantages over existing methods: - It does not require any model structure modifications or additional training data, making it easy to implement [18]. - It provides a feedback loop for prompt correction based on real generated results, enhancing the optimization process [18]. - It demonstrates strong cross-domain applicability, particularly in various physical scenarios [18]. Experimental Results - The framework has been tested on multiple T2V models, showing significant improvements in physical consistency (PC) and semantic adherence (SA) scores, with the CogVideoX-5B model achieving up to 2.2 times improvement in PC and 2.3 times in SA [23][26]. Conclusion - PhyT2V represents a novel, data-independent approach to T2V generation, ensuring that generated videos comply with real-world physical principles without the need for additional model retraining, marking a significant step towards creating more realistic T2V models [26].
VDC+VBench双榜第一!强化学习打磨的国产视频大模型,超越Sora、Pika
机器之心· 2025-05-06 04:11
Core Insights - The article discusses the integration of reinforcement learning into video generation, highlighting the success of models like Cockatiel and IPOC in achieving superior performance in video generation tasks [1][14]. Group 1: Video Detailed Captioning - The video detailed captioning model serves as a foundational element for video generation, with the Cockatiel method achieving first place in the VDC leaderboard, outperforming several prominent multimodal models [3][5]. - Cockatiel's approach involves a three-stage fine-tuning process that leverages high-quality synthetic data aligned with human preferences, resulting in a model that excels in fine-grained expression and human preference consistency [5][8]. Group 2: IPOC Framework - The IPOC framework introduces an iterative reinforcement learning preference optimization method, achieving a total score of 86.57% on the VBench leaderboard, surpassing various well-known video generation models [14][15]. - The IPOC method consists of three stages: human preference data annotation, reward model training, and iterative reinforcement learning optimization, which collectively enhance the efficiency and effectiveness of video generation [19][20]. Group 3: Model Performance - Experimental results indicate that the Cockatiel series models generate video descriptions with comprehensive dimensions, precise narratives, and minimal hallucination phenomena, showcasing higher reliability and accuracy compared to baseline models [7][21]. - The IPOC-2B model demonstrates significant improvements in temporal consistency, structural rationality, and aesthetic quality in generated videos, leading to more natural and coherent movements [21][25].
关注AI多模态
2025-04-15 14:30
Summary of Conference Call Industry Overview - The discussion primarily revolves around the **AI technology** sector, particularly focusing on **AI video models** and **multimodal search capabilities**. The recent advancements in AI applications have catalyzed movements in the primary market financing, with notable reactions observed in the **A-share media** and **Hang Seng Technology Index** since the second week of March [1] Core Insights and Arguments - The AI sector is advancing in two main directions: 1. **Tool Development**: Emphasis on refining AI multimodal applications, with recent reports highlighting the impact of open-source AI video generation models and the launch of Tencent's membership model on March 6 [2] 2. **Application Exploration**: Focus on innovative applications, including AI companionship and interaction, with products like **EVE** and AI toys being highlighted for their technological responsiveness and user engagement [3][4] - The **AI interaction** segment is evolving through platforms that allow users to create virtual personas and engage with AI characters, enhancing storytelling and user experience [5] - In the **advertising sector**, there is a cautious recovery observed, with some industries showing signs of improvement. Notably, sectors like **3C digital** are recovering, and e-commerce giants like **Alibaba** and **JD.com** are expected to influence advertising spending positively [6][7] - The **AI hardware** market is also gaining traction, with brands like **iFlytek** and **Bubugao** emerging as key players, indicating a growing demand for AI-related products [8] - The overall cost structure in the advertising space remains stable, with a quarterly operating cost around **11 million**. This stability is expected to support profit growth alongside revenue increases [9] Additional Important Content - The **film industry** is experiencing a rebound, particularly in ticket sales across different city tiers, with major players like **Wanda** and **Cinemas** holding significant market shares [10][11] - The **long video platform** performance in February showed a decline in MAU for three major platforms, with **iQIYI** leading in effective play share at **33.9%**. The increase in **Youku's** share by **2.4 percentage points** indicates a positive trend for the platform [12][13] - Upcoming film releases and the performance of key series and variety shows are anticipated to drive engagement and viewership in the coming months, with several major productions awaiting release [14]
26款AI工具入门,看这一篇就够了
虎嗅APP· 2025-03-03 10:08
Core Viewpoint - The article discusses the rapid evolution and diversification of AI tools leading up to 2025, highlighting their transformative impact on work and daily life, similar to the internet and smartphones [2][4][82]. Group 1: AI Dialogue Tools - ChatGPT is noted for its comprehensive functionality and wide application, although it has shown signs of stagnation in innovation [9][10]. - Doubao excels in understanding Chinese context and offers a user-friendly experience, making it a popular choice among domestic users [11][12]. - Gemini integrates Google's powerful search capabilities with AI dialogue, providing real-time information retrieval [13][14]. Group 2: AI Writing Tools - DeepSeek R1 is recognized as the strongest open-source model in China, particularly effective for creative writing [16][17]. - Claude is acknowledged for its high-quality writing and coding capabilities, making it a valuable tool for professionals [21][23]. - Grok is characterized by its humorous and engaging responses, suitable for social media content creation [25][26]. Group 3: AI Drawing Tools - Jimeng is tailored for Chinese users, excelling in generating artwork that reflects Eastern aesthetics [30][31]. - Kuaishou's Ketu is a simple and effective AI drawing tool that supports Chinese prompts [32][33]. - Whisk allows users to create art by uploading images, offering a unique and intuitive approach to artistic creation [35]. Group 4: AI Video Tools - Keling is highlighted as a leading domestic video generation tool, achieving high-quality outputs [44][45]. - Pika, founded by Chinese creators, offers excellent dynamic element integration in videos [47][48]. - Runway is recognized for its pioneering role in AI video generation, although it is noted for its higher pricing [50][51]. Group 5: AI Audio Tools - Hailuo AI is praised for its natural-sounding voice generation and precise cloning capabilities, making it ideal for content creators [55][57]. Group 6: AI Programming Tools - Cursor is noted for its professional capabilities but has a steeper learning curve [61][64]. - Windsurf is more user-friendly, suitable for beginners [62][66]. - Trae, developed by ByteDance, offers a seamless user experience with Chinese language support [66]. Group 7: AI Search Tools - Perplexity.ai is recognized as a pioneer in AI search tools, enhancing information accuracy [68][69]. - Nano AI Search, launched by Zhou Hongyi, has gained popularity for its comprehensive features [71][72]. - Meta Search focuses on academic research, providing tools for knowledge management [73]. Group 8: AI Music Tools - Suno is highlighted as a leading AI music creation tool, supporting various styles [74][75]. - Haimian Music, developed by ByteDance, is user-friendly and accessible [76][77]. - MusicFX, from Google, is noted for its simplicity and high-quality music generation [78][80].
对话 PixVerse 王长虎:AI 视频生成可能通向新平台,Sora 只领先几个月
晚点LatePost· 2024-04-30 10:25
"抖音就是从 15 秒的视频做起来的。" 文丨王与桐 编辑丨程曼褀 今年 2 月 OpenAI 发布了由视频模型 Sora 生成的视频,时长可达 60 秒并且视频内容丝滑、连贯、 逼真。 一张梗图在 Sora 发布后流传于社交媒体:Sora 是坐在宝座上的巨大神像,下面跪着一众渺小的膜拜 者,包括 Runway、Pika、SVD、PixVerse 等十多个视频生成模型或产品。 Sora 出现后,这张梗图开始流传。 "能被放在第一排,我们很高兴。" 推出 PixVerse 的爱诗科技创始人兼 CEO 王长虎说。 PixVerse 是 "膜拜者" 中唯一一个由中国公司开发的产品,网页端产品在今年 1 月上线,根据第三方 监测平台 SimilarWeb 数据,PixVerse 3 个月内达到了超过 140 万的月访问量,去年 11 月上线的 Pika 现在是超 200 万的月访问量。 做出 PixVerse 的爱诗科技由王长虎在 2023 年 4 月创立。2017 年初 ,王长虎加入字节跳动,担任 AI Lab 视觉技术负责人。作为在微软亚洲研究院学习和工作十余年的计算机视觉专家,王长虎带领 技术团队,研发了抖音、 ...