Workflow
可灵3.0 Omni
icon
Search documents
腾讯研究院AI速递 20260210
腾讯研究院· 2026-02-09 16:03
Group 1: Generative AI Developments - Pony Alpha has gained popularity on OpenRouter for its strong programming capabilities, allowing developers to create playable games like Pokemon Ruby in just three hours [1] - The model demonstrated impressive performance by autonomously replicating "Stardew Valley," showcasing its understanding of system-level engineering and long-term reasoning abilities [1] - Speculations about the model's origins suggest it could be from Anthropic Sonnet 5, DeepSeek-V4, or Zhizhu GLM-5, indicating a new stage for domestic models in advanced programming [1] Group 2: AI Video Editing Innovations - Xiaohongshu is developing an AI video editing application called OpenStoryline, which utilizes a "non-linear editing + dialogue-driven" approach for users to create videos by uploading images and using natural language [2] - The technology combines DeepSeek and Qwen 3 open-source models with Xiaohongshu's own dots.lm text model and FireRedASR audio model for ecosystem adaptation [2] - The establishment of the Red&Live independent department aims to focus on short videos and live streaming, targeting a goal of 300 million DAU and transitioning from a text-based community to a comprehensive platform [2] Group 3: Film Production Tools - The Beijing Film Academy director tested the Keling 3.0 Omni for pre-production, generating dynamic previews that help unify visual understanding among photography, art, and lighting departments before filming [3] - The model exhibited film-level tonal control, accurately replicating the quality of diffused light on cloudy days and the refraction of raindrops [3] - In tests involving multi-character dialogue scenes, the model performed excellently in character consistency, audio-visual synchronization, and gaze matching, making it suitable for rehearsal materials and lighting plans [3] Group 4: Real-time Interactive Video Models - Xmax AI launched the world's first real-time interactive video generation model, X1, capable of millisecond-level real-time generation and gesture interaction [4] - Key features include dimensional interaction, world filters, touch animations, and expression capture, allowing users to upload character images for real-world interaction [4] - The team enhanced diffusion sampling speed by a hundredfold through an end-to-end streaming re-rendering architecture, addressing industry data scarcity [4] Group 5: AI Domain Acquisition - Kris Marszalek, founder of Crypto.com, purchased the domain AI.com for $70 million (approximately 500 million RMB), setting a new record for domain transactions [5] - AI.com is positioned as a Personal AI Agent platform, promising users the ability to create personal AI agents capable of messaging, app operations, and stock trading within 60 seconds [5] Group 6: AI Infrastructure Spending - By 2026, the combined AI infrastructure spending of Meta, Amazon, Microsoft, and Google is expected to exceed $60 billion (approximately 416 billion RMB), representing a year-on-year increase of over 70% [9] - This spending level is comparable to the annual GDP of Sweden or Israel and accounts for about 2.1% of the US GDP, second only to the Louisiana Purchase in 1803 [9] - Apple is the only company reducing capital expenditures by 19% year-on-year, opting to collaborate with Google's Gemini to access top-tier AI models at a lower cost [9]
实测可灵3.0 - 属于每个人的导演时代。
数字生命卡兹克· 2026-02-05 02:23
Core Viewpoint - The article discusses the significant upgrade of the AI video generation tool, 可灵 (Keling), from version 2.0 to 3.0, highlighting its enhanced capabilities in video production, particularly in terms of scene segmentation and language processing. Group 1: Video Generation Capabilities - 可灵 3.0 introduces a new level of video generation, allowing users to create videos with a variety of scene cuts and camera movements using simple prompts [3][7]. - The tool can generate videos ranging from 3 to 15 seconds, with options for both intelligent and custom scene segmentation [8][16]. - Users can create compelling narratives with minimal input, as the AI can autonomously fill in details based on basic instructions [19][20]. Group 2: Scene Segmentation - The intelligent scene segmentation feature allows users to input a prompt and receive a series of automatically generated scenes that align with the narrative [8][19]. - Custom scene segmentation provides users with detailed control over each shot, enabling the creation of complex video sequences [16][17]. - The tool effectively handles various cinematic techniques, including reverse shots, enhancing the storytelling experience [19][24]. Group 3: Language Processing - 可灵 3.0 showcases advanced language capabilities, enabling the generation of multilingual content seamlessly integrated into video narratives [31][39]. - The tool can create educational videos that incorporate language learning in a creative manner, making the learning process engaging [33][36]. - Language capabilities can be combined with scene segmentation to produce dynamic videos featuring characters speaking different languages in context [41]. Group 4: Omni Model - The 可灵 3.0 Omni model allows for video editing and modification, distinguishing it from the standard version which focuses on video generation [42][45]. - Users can replace characters in existing video clips while maintaining the original action and context, showcasing the model's editing prowess [44][49]. - Both 可灵 3.0 and 3.0 Omni support extracting audio and visual elements from previous works, enhancing the efficiency of video production [45][51]. Group 5: Future Implications - The upgrade to 可灵 3.0 represents a comprehensive enhancement in AI video production, potentially democratizing video creation for a broader audience [52]. - The integration of scene segmentation and editing capabilities is expected to significantly boost productivity in AI video creation [52]. - The article suggests that the future of AI video production may lead to a new era where everyone can act as a director, simplifying the creative process [52].