Workflow
Runway
icon
Search documents
告别“音画割裂”与“人物崩坏”!AutoMV:首个听懂歌词、卡准节拍的开源全曲级MV生成Agent
量子位· 2025-12-29 06:37
现有的AI视频生成模型虽然在短片上效果惊人,但面对一首完整的歌曲时往往束手无策——画面不连贯、人物换脸、甚至完全不理会歌词含 义。 近日,来自M-A-P、北京邮电大学、南京大学NJU-LINK实验室等机构的研究者们提出了 AutoMV 。 这是一个 无需训练的多智能体(Multi-Agent)系统 ,它像一个专业的摄制组,能根据音乐节拍、歌词和结构,自动生成长达数分钟、叙事 连贯且音画同步的完整MV。 AutoMV团队 投稿 量子位 | 公众号 QbitAI 对于独立音乐人来说,制作一支专业的MV往往意味着高昂的成本 (约1万美元) 和漫长的周期 (数十小时) 。 虽然Sora、Runway等视频生成模型层出不穷,但直接用它们生成MV面临三大挑战: 1.时长限制 :大多数模型只能生成几秒钟的片段,无法覆盖整首歌。 2.音画割裂 :生成的画面往往只跟提示词有关,忽略了音乐的节拍 (Beats) 、结构 (Intro/Chorus) 和歌词含义。 △ 《Lazy Song Demo》 △ 《Beliver Demo》 为什么做"全曲"MV这么难? △ 《APT Demo》 3.一致性差 :在长达几分钟的视频中,主 ...
2026年互联网传媒投资策略:国内AI纵深发展,悦己消费全球化
Group 1 - The core opportunity in the internet and media sectors for 2025 is centered around AI revaluation, particularly in cloud computing, and the globalization and youth-oriented trends in self-consumption, such as trendy toys, music, and concerts [3][4] - AI cloud capital expenditure (capex) is expected to expand in its second year, with a focus on return on investment (ROI) from AI investments, making capex/operating cash flow a key metric for investors [3][4] - Major companies to watch in the AI cloud space include Alibaba, Baidu, and Kingsoft Cloud, which are focusing on domestic production and infrastructure [3][12] Group 2 - The AI application landscape is shifting from conceptual discussions to a focus on commercial viability, with significant developments in AI advertising and video monetization expected in 2026 [3][4] - Tencent, Bilibili, Meitu, Kuaishou, and Focus Technology are highlighted as key players in the AI application ecosystem, with a particular emphasis on the monetization of chatbot applications and the evolution of AI video tools into community platforms [3][4] - The gaming sector is seeing structural opportunities driven by Generation Z and international expansion, with a focus on companies like Giant Network, Century Huatong, and Xindong Company [3][4] Group 3 - The self-consumption trend is expected to continue, with gaming, music, and trendy toys being key areas of growth, particularly as the market adjusts post-2025 [3][4] - The video sector is anticipated to reach a turning point, with policy stabilization and diverse monetization strategies being crucial for growth [3][4] - Companies such as Mango Super Media, Shanghai Film, and Reading Group are positioned to benefit from these trends [3][4] Group 4 - The report indicates a recovery in companies like Focus Media, Vision Source, and educational publishing firms, suggesting a positive outlook for these sectors [3][4] - The report emphasizes the importance of continuous performance and valuation adjustments in the context of evolving market conditions [3][4] Group 5 - The domestic cloud computing market is witnessing increased capital expenditure from major internet companies, with Alibaba and Tencent leading the charge [18][19] - The report highlights the importance of measuring the health of cloud investments through the capex/operating cash flow ratio, with Tencent's ratio being notably lower than its peers [19][29] - AI-driven cloud services are expected to maintain higher profit margins compared to traditional cloud offerings, with a focus on internal workload efficiencies [29][30] Group 6 - The report outlines the competitive landscape of AI applications, noting that Chinese companies are making significant strides in the global market, particularly in productivity tools and content generation [34][35] - The emergence of ChatGPT as a multi-functional platform is reshaping the AI application ecosystem, with significant implications for user engagement and commercial applications [35][39] - Advertising remains a critical area for AI commercialization, with companies like Meta, Tencent, and Bilibili leveraging AI to enhance ad performance and efficiency [43][49]
我们用21款AI修图工具修了100张图:谁才是真正的“修图神器”?|Jinqiu Scan
锦秋集· 2025-11-10 11:38
Core Viewpoint - The article focuses on evaluating 21 AI image editing tools across six real-life scenarios to determine their effectiveness in understanding and executing user requests for image modifications [4][11][141]. Group 1: Evaluation Methodology - The evaluation consists of six rounds, each using the same prompt for image editing, with all models set to their latest default configurations [11][12]. - Three general evaluation dimensions are used: visual consistency, local quality, and content consistency [12][13][14]. Group 2: Performance Results - Top performers include Tencent Yuanbao, Meitu Xiu Xiu, and Qwen Image Edit, scoring 15 points for effectively meeting user prompts without noticeable discrepancies [23]. - Nano Banana, Sora, Lovart, Manus, and Runway scored 14 points, with minor issues in image retrieval capabilities [28]. - Tools like Jiemeng 4.0, Wake Map, and Pixel Cake scored around 10 points, showing significant errors despite being dedicated image editing software [30]. Group 3: Specific Findings - In the first round, Tencent Yuanbao and Meitu Xiu Xiu excelled in removing unwanted elements while enhancing image clarity [23]. - The second round highlighted Qwen Image Edit and Genspark as top performers in foreground extraction, maintaining original details [41]. - The third round saw Jiemeng 4.0 and Tencent Yuanbao achieving high scores for effectively replacing elements while preserving the original image's integrity [65]. Group 4: Future Directions - The article indicates plans for future evaluations of AI tools in areas such as game development, knowledge bases, and companionship products [7].
Wan2.2-Animate又火了,5分钟让抠脚大汉秒变高冷女神。
数字生命卡兹克· 2025-10-30 01:33
Core Viewpoint - The article discusses the capabilities and implications of the open-source model Wan2.2 Animate, which allows users to create highly realistic face-swapping videos and animations, highlighting its potential in various creative fields while also addressing the ethical concerns associated with such technology [1][25][26]. Group 1: Technology and Features - Wan2.2 Animate can generate natural face-swapping videos by using a combination of user-uploaded videos and images, achieving impressive results in mimicking expressions and movements [1][4][6]. - The model allows for voice modulation alongside visual changes, enhancing the realism of the generated content [9]. - It supports both action imitation and character replacement, enabling users to create videos with different characters while maintaining the original background [14][15][16]. Group 2: Accessibility and Open Source - Wan2.2 Animate is notable for being open-source, which differentiates it from other similar models that are not publicly available [14][25]. - The model can be easily accessed and utilized by anyone, significantly lowering the barrier to entry for animation and video creation [25][26]. - It can be deployed in various settings, including enterprises and film productions, allowing for cost-effective animation and special effects [25]. Group 3: Creative Applications - The technology can be used for various creative projects, including recreating classic film scenes or generating dance videos with different characters [12][26]. - It opens up new possibilities for independent animators and filmmakers, enabling them to bring their characters to life with minimal investment [25][26]. - The potential for reviving deceased actors in new films through AI-generated likenesses is also discussed, showcasing the transformative impact of this technology on the film industry [26]. Group 4: Ethical Considerations - The article raises concerns about the misuse of such technology, particularly in creating misleading or harmful content that could undermine trust in digital media [26]. - It emphasizes the importance of responsible use of technology, likening it to fire that can either warm or destroy [26].
深度解析谷歌Genie 3:“一句话,创造一个世界”
Hu Xiu· 2025-08-18 08:55
Core Insights - Google DeepMind's Genie 3 represents a significant paradigm shift in AI-generated content, transitioning users from passive consumers to active participants in a generative interactive environment [1][2] - The ultimate goal of the Genie project is to pave the way towards Artificial General Intelligence (AGI), with Genie 3 serving as a critical foundation for training AI agents [2][15] Group 1: Technological Breakthroughs - Genie 3 achieves real-time interactivity, generating a fully interactive world at 720p resolution and 24 frames per second, contrasting sharply with its predecessor Genie 2, which required several seconds to generate each frame [5][6] - The interaction horizon of Genie 3 allows for coherent and interactive sessions lasting several minutes, enabling more complex task simulations compared to Genie 2's limited interaction time [6][7] - Emergent visual memory allows objects and environmental changes to persist even when not in view, indicating a significant advancement in the AI's understanding of object permanence [8][10] - Users can dynamically alter the world by inputting new prompts, granting them the ability to inject events or elements into the environment in real-time, enhancing the training capabilities for AI agents [11][12] Group 2: Applications and Implications - Genie 3 is primarily designed as a training ground for the next generation of AI agents, particularly embodied agents like robots and autonomous vehicles, addressing the need for diverse and safe training data [15][16] - The technology has the potential to revolutionize the gaming industry by drastically reducing the time and cost of game development, although it currently faces limitations in user experience and precision compared to established game engines [17][18] - In education, Genie 3 can create immersive learning environments, allowing students to engage with historical or medical scenarios in a risk-free setting, aligning with broader trends in educational technology [19] Group 3: Competitive Landscape - Genie 3 differs fundamentally from other models like Sora and Runway, as it functions as a world model for interactive simulation rather than a video generation model [21][22] - The comparison highlights that while Sora excels in high-fidelity video generation, Genie 3 focuses on real-time interactive simulations, positioning itself uniquely in the AI landscape [24][25] Group 4: Future Directions - Despite its advancements, Genie 3 still faces challenges in stability, fidelity, and control, indicating that further development is needed to achieve practical applications in gaming and simulation [28][31] - The integration of Genie 3 with VR/AR technologies presents exciting possibilities, but it requires overcoming significant technical hurdles to ensure real-time, immersive experiences [32][33]
Z Product|Product Hunt最佳产品(7.14-20) ,华人产品夺取榜二、榜三!
Z Potentials· 2025-07-22 03:05
Core Insights - The article highlights the emergence of innovative AI-driven tools that enhance productivity across various sectors, focusing on their unique features and market potential [2][4][27]. Group 1: ClickUp and Brain MAX - ClickUp is a comprehensive productivity platform that integrates the multi-modal AI assistant Brain MAX, aimed at improving team collaboration and project management [2][4]. - Brain MAX utilizes top language models for intelligent search and task automation, supporting voice commands and enhancing information processing efficiency [4][5]. - The product has received significant user engagement, with 1,082 upvotes and 277 comments [6]. Group 2: OpenArt AI - OpenArt is an AI-driven visual storytelling platform that helps creators quickly generate coherent visual narratives [7][8]. - It addresses the challenges of traditional content creation by enabling users to transform ideas into engaging stories in minutes [8][9]. - The platform has garnered 905 upvotes and 100 comments, indicating strong user interest [12]. Group 3: TestSprite 2.0 - TestSprite 2.0 is an AI-powered tool for automating end-to-end software testing through natural language interaction [13][14]. - It significantly reduces testing costs by up to 90% and accelerates software delivery [14][15]. - The product has achieved 946 upvotes and 141 comments, reflecting its appeal to developers [19]. Group 4: Dualite - Dualite is an AI application builder that converts Figma designs into React and HTML/CSS code, streamlining the design-to-code process [20][21]. - It targets designers and developers seeking to enhance UI development efficiency while ensuring data privacy [21][22]. - The tool has received 765 upvotes and 96 comments, showcasing its market traction [23]. Group 5: Coefficient.io - Coefficient.io transforms Google Sheets into a real-time data synchronization hub, integrating multiple SaaS systems [24][25]. - It addresses data silos and manual update challenges faced by sales and operations teams [27][28]. - The platform has achieved 758 upvotes and 52 comments, indicating a positive reception [29]. Group 6: Finlens - Finlens is an AI accounting collaboration tool designed for startups and accountants, enhancing financial management efficiency [30][31]. - It automates processes to reduce manual data handling and improve transparency [31][32]. - The product has garnered 1,082 upvotes and 277 comments, highlighting its relevance in the market [32]. Group 7: Mozart AI - Mozart AI is a browser-based music creation platform that assists users in generating high-quality music through AI [33][34]. - It caters to both amateur and professional musicians, addressing traditional music production challenges [36][37]. - The platform has received 666 upvotes and 151 comments, reflecting user engagement [38]. Group 8: Untitled UI - Untitled UI React is an open-source React component library that offers a vast collection of components for developers [40][42]. - It aims to streamline UI design and development processes, ensuring consistency between design and code [42][43]. - The library has achieved 653 upvotes and 96 comments, indicating strong interest from the developer community [44]. Group 9: Checklist Genie - Checklist Genie leverages AI to help users create and manage task lists efficiently through voice and image recognition [45][49]. - It simplifies the task management process, catering to individuals and professionals seeking productivity enhancements [49][51]. - The tool has garnered 612 upvotes and 49 comments, showcasing its market potential [52]. Group 10: Runway - Runway is an AI-driven recruitment tool that customizes candidate screening and ranking based on specific job requirements [54][55]. - It addresses inefficiencies in traditional applicant tracking systems, enhancing the hiring process for HR professionals [55][56]. - The product has received 511 upvotes and 74 comments, indicating its appeal in the recruitment sector [55].
放弃国企工作,创办一人企业:我一定能用AI挣到钱!丨AI转型访谈录
腾讯研究院· 2025-06-20 07:33
Core Viewpoint - The article discusses the transformative impact of AI on industries and individuals, highlighting the journey of a professional who transitioned from a state-owned enterprise to leveraging AI in the film production sector, emphasizing the importance of creativity and foundational skills alongside AI tools [1][6][70]. Group 1: Guest Introduction - The guest, He Qiujian, is the founder of a film studio specializing in AI-generated content and has collaborated with various state-owned enterprises and media outlets [2]. Group 2: Personal Journey and AI Adoption - He Qiujian left his stable job in a state-owned enterprise after 15 years to pursue opportunities in AI, driven by the need for financial stability and personal interest in the field [6][9][18]. - Initially, he had limited knowledge of AI, primarily understanding GPT, but he dedicated significant time to learning AI tools like Stable Diffusion and ComfyUI [12][18]. Group 3: Early Experiences and Challenges - His first AI project earned him 10 yuan for a five-day effort, marking a significant milestone as he became the first among his peers to monetize AI skills [12][14]. - He faced anxiety during the transition from a stable income to freelancing, but he was motivated by the desire to prove his capabilities to friends and family [18][49]. Group 4: Building a Client Base - He Qiujian's average monthly income now ranges from 40,000 to 50,000 yuan, achieved through a combination of quality work and excellent customer service [24][25]. - He emphasizes the importance of understanding AI tools deeply and effectively communicating with clients to meet their needs [25][72]. Group 5: Tools and Techniques - He utilizes various AI tools for scriptwriting, image generation, and video production, with monthly costs for these tools amounting to several thousand yuan [44]. - The guest stresses that while tools are essential, the creative thought process is the core competitive advantage in the industry [45][70]. Group 6: Future Outlook and Advice - He believes that AI short films may become a trend, but the current technology cannot yet compete with traditional productions in terms of storytelling and quality [66]. - He advises continuous learning and maintaining a strong work ethic to avoid being replaced by AI, emphasizing that AI enhances human capabilities rather than replacing them [78][80].
企业培训 | 未可知 x 恒都律所:AI驱动律师IP孵化新范式
Core Viewpoint - The article discusses the revolutionary application of AI technology in IP incubation and operation, highlighting how AI enhances efficiency and commercial value in content creation [1][13]. Group 1: AI Empowerment in IP Incubation - Traditional IP incubation faces challenges such as high content creation costs, long cycles, lack of data-driven market insights, and limited monetization paths [3]. - AI tools like ChatGPT, Midijourney, and Runway can automate the production of text, images, and videos, significantly reducing creation costs while enhancing efficiency [5]. - AI data analysis tools can accurately predict user behavior and market trends, providing a scientific basis for IP positioning and operational strategies [5]. Group 2: Deepost Platform and AI Value - The Deepost platform aims to lower the barriers to IP incubation and enhance operational efficiency through AI technology, enabling data-driven decision-making and sustainable monetization [7]. - AI in the Deepost platform provides three layers of value: as an efficiency tool for content generation and data analysis, as a decision assistant for optimizing operational strategies, and as a creative partner to break traditional thinking limitations [7]. Group 3: Full Process AI Empowerment - AI technology is integrated throughout the entire IP incubation process, from positioning design to content production and operational management [9]. - In the positioning design phase, AI assists in precise IP concept positioning through market research and data analysis; during content production, it builds a comprehensive content matrix from text to video using multimodal AI tools; in operational management, it enables intelligent community management, precise advertising, and real-time data analysis [9]. Group 4: Successful Case Studies and Future Directions - The training shared successful case studies demonstrating AI's practical applications in IP incubation, such as efficient fan growth and monetization in short video IPs through AI-generated scripts and intelligent ad placements [11]. - AI has opened new monetization paths, including content subscriptions, smart recommendations, and data insight services, providing more possibilities for IP incubation [11]. - With advancements in multimodal AI, personalized engines, and real-time interaction technologies, IP incubation is rapidly evolving towards greater intelligence and precision [11].
We Tested Google Veo and Runway to Create This AI Film. It Was Wild. | WSJ
AI Video Generation - The film was created using AI video tools, including Google Veo 3, with most of the audio also AI-generated [1] - Google Veo and Runway were identified as the best AI video tools for achieving consistency in character representation across scenes [7] - The production process involved using Midjourney for character design and Runway's References tool for scene creation, followed by Google Veo for motion generation [9][10] - Veo 3 was used for text-to-video prompts in scenes without characters [11] AI Audio Generation - AI audio tools like ElevenLabs were used to generate character voices, with the option to describe or clone voices [12] - Suno, an AI music generator, was used to create the song at the end of the film [13] Production Cost & Human Input - The estimated cost for using Google and Runway's AI tools was around $1,000 [13] - The script was written by humans, emphasizing the importance of human input, creativity, and original ideas in AI-assisted filmmaking [13][14]
报告:DeepSeek使用率下降一半,快手可灵登顶视频组
Guan Cha Zhe Wang· 2025-05-14 04:08
Core Insights - The usage of the DeepSeek-R1 model by the Chinese company DeepSeek has decreased by 50% from its peak in February, yet it remains in third place among inference models [1][3] - Kuaishou's Kling series of video generation models has rapidly gained over 30% market share, with Kling-2.0-Master achieving 20.9% within three weeks of its release [1][5] Inference Model Trends - The "DeepSeek moment" in February caused the share of inference models in all text models to surge from 2% to 10% within two weeks, currently stabilizing at 8% [1][3] - DeepSeek-R1 captured over 50% of the inference model text messages sent to the platform shortly after its launch, breaking OpenAI's previous monopoly [3] - As of March, the entry of Anthropic's Claude-3.7-Sonnet-Reasoning model led to a decline in DeepSeek-R1's market share, which was further impacted by Google's Gemini-2.5-Pro, now holding 31.5% [3][5] OpenAI and Competitors - OpenAI's inference model family has maintained a total market share of no less than 30% due to continuous releases of various models [5] - Grok 3 model has less than 1% market share, possibly due to limited API support for its mini version [5] Video Generation Models - Kuaishou's Kling series has a combined market share exceeding 30%, with Runway leading individual model shares at 23.6% [5] - Kling-2.0-Master supports high-definition video generation at 1080p and has seen rapid adoption, reaching a user base of over 22 million since its launch [7]