视频生成

Search documents
Sora2甚至可以预测ChatGPT的输出
量子位· 2025-10-02 05:30
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 让它模拟"给ChatGPT发信息",它不仅生成了画面,还来了一段有问有答的"交互"。 先是编了一个问题:Write a playful haiku about a cat staring out the window.(写一首关于猫凝视窗外的俏皮俳句。) Sora2太卷了。 居然能预测ChatGPT的输出、渲染HTML?! 然后又以ChatGPT回答的模式给出了音频回应:Whiskers pressed to glass. Birds gossip beyond the pain. Tail flicks. Daydreams fly. (中文大意是:"胡须紧贴玻璃。鸟儿在窗外叽喳。尾巴轻摇。白日梦飞扬。) 全程以ChatGPT的机械女声回答,并且俳句音节还卡得严丝合缝。 这段 视频场景+LLM推理 的实测效果让一众网友惊叹,甚至有人说"Sora2模糊了视频生成和交互式AI的边界"。 而这段代码在真实浏览器中渲染的样子be like: 实际上不仅是像这样能预测ChatGPT的推理回答,Sora2还能渲染HTML。 通过了玻璃折射测试 还有人让Sora2渲染 ...
Sora 2深夜来袭,OpenAI直接推出App,视频ChatGPT时刻到了
机器之心· 2025-09-30 23:49
机器之心报道 机器之心编辑部 没想到吧,在别家节前卷大模型时,OpenAI 悄悄发布了 Sora2。 而且,这次是直接产品化,推出了 App,甚至还有配套的视频推送算法,声称可以防成瘾。这是要做自己的 TikTok? 在介绍文章中,OpenAI 更是直言 Sora2 直接进入了视频领域的 GPT 3.5 时刻,也就是当时的 ChatGPT 时刻。 如此看来,OpenAI对Sora2的技术能力与产品体验都极为满意。 我们也搞到了邀请码,在后续文章中将体验一波。海外已经体验上的网友称,这就是媒体、电影和娱乐的新时代。 据介绍,Sora 在物理准确性、真实感和可控性方面都优于以往的系统。 另外,就是它还具备同步的对话和音效能力。 Altman 称之为 ChatGPT for creativity 时刻。 接下来就让我们先看下Sora2的官方效果吧。 Sora来了 2024 年 2 月发布的初代 Sora 模型,在很多方面都堪称视频领域的 GPT-1 时刻 —— 这是视频生成首次让人觉得开始行得通,像物体恒存性这样的简单行为,也随 着预训练计算量的提升而出现。从那以后,Sora 团队一直专注于训练具备更先进世界模拟能 ...
世界模型,腾讯混元卷到了榜首
量子位· 2025-09-03 07:30
Core Viewpoint - Tencent's HunyuanWorld-Voyager model has been released and is now open-source, showcasing significant advancements in 3D scene generation and immersive experiences, outperforming existing models in the WorldScore benchmark [1][3][45]. Group 1: Model Features and Innovations - HunyuanWorld-Voyager is the industry's first model supporting native 3D reconstruction for long-distance roaming, allowing for the generation of consistent roaming scenes and direct video export to 3D formats [4][24]. - The model introduces a new "roaming scene" feature, enhancing interactivity compared to traditional 360° panoramic images, enabling users to navigate within the scene using mouse and keyboard [10][11]. - It supports various applications, including video scene reconstruction, 3D object texture generation, and video style customization, demonstrating its spatial intelligence potential [27]. Group 2: Technical Framework - The model innovatively incorporates scene depth prediction into the video generation process, combining spatial and feature information to support native 3D memory and scene reconstruction [29]. - It features a unified architecture for generating aligned RGB and depth video sequences, ensuring global scene consistency [33]. - A scalable data construction engine has been developed to automate video reconstruction, allowing for large-scale and diverse training data without manual annotation [34]. Group 3: Performance Metrics - In the WorldScore benchmark, HunyuanVoyager achieved a score of 77.62, ranking first in overall capability, surpassing existing open-source methods [36]. - The model demonstrated superior video generation quality, with a PSNR of 18.751 and an SSIM of 0.715, indicating its ability to produce highly realistic video sequences [39]. - In subjective quality assessments, HunyuanVoyager received the highest ratings, confirming its exceptional visual authenticity [44]. Group 4: Deployment and Open Source - The model requires a resolution of 540p and a peak GPU memory of 60GB for deployment [47]. - Tencent is accelerating its open-source initiatives, including the release of various models and frameworks, contributing to the broader AI landscape [48].
阿里通义万相新突破:静态图+音频,轻松生成电影级数字人视频!
Sou Hu Cai Jing· 2025-08-27 20:45
Core Viewpoint - Alibaba demonstrates its strong capabilities in artificial intelligence by launching the open-source multi-modal video generation model Wan2.2-S2V, which allows users to create high-quality digital human videos from a static image and audio input [1][3]. Group 1: Product Features - The Wan2.2-S2V model can generate videos with a duration of up to several minutes, significantly enhancing video creation efficiency in industries such as digital human live streaming, film post-production, and AI education [2][5]. - The model supports various video resolutions, accommodating both vertical short videos and horizontal films, and incorporates advanced control mechanisms like AdaIN and CrossAttention for improved audio synchronization [3][5]. - Users can upload an image and audio to generate dynamic videos where the subject can perform actions like speaking and singing, with facial expressions and lip movements closely synchronized to the audio [3][5]. Group 2: Industry Impact - Alibaba has been at the forefront of video generation technology, having previously released the Wan2.2 series models, which set new industry standards with their MoE architecture [3]. - The introduction of the Wan2.2-S2V model addresses the growing demand for efficient video creation tools in rapidly evolving sectors such as digital human live streaming and film production [5]. - The advancements in video generation technology are expected to lead to further innovations and breakthroughs in the field, driven by continuous improvements in the underlying models [5].
快手(01024)绩后连续两个交易日累计涨幅超8%,获11家机构集体上调目标价
智通财经网· 2025-08-25 03:11
Core Viewpoint - Kuaishou's strong stock performance is attributed to its better-than-expected Q2 earnings report, leading to a significant increase in target prices from multiple financial institutions [1][2] Group 1: Financial Performance - Kuaishou's Q2 financial indicators, including profit levels, core business revenue, and e-commerce GMV, exceeded market expectations [1] - UBS forecasts a 13% growth in Kuaishou's e-commerce GMV for the second half of the year, outpacing the overall industry [2] Group 2: Market Sentiment and Analyst Ratings - Eleven institutions, including Goldman Sachs and Morgan Stanley, have raised their target prices for Kuaishou following the earnings report [1] - The announcement of a special dividend has been interpreted as a sign of strong cash flow and management's optimism about future profitability [2] Group 3: Business Segments and Valuation - Analysts are increasingly recognizing the independent valuation logic of Kuaishou's core business, with some adjusting target prices based on 2026 PE multiples [1] - The market remains optimistic about Kuaishou's commercialization potential in both its core business and e-commerce segments [2] Group 4: Operational Efficiency - Despite increased capital expenditures in artificial intelligence, Kuaishou has maintained stable overall profit margins, which has received positive feedback from several institutions [1] - Analysts believe that Kuaishou can sustain profit margins while increasing AI investments, primarily due to strong operational leverage [1]
视频生成 vs 空间表征,世界模型该走哪条路?
机器之心· 2025-08-24 01:30
Core Insights - The article discusses the ongoing debate in the AI and robotics industry regarding the optimal path for developing world models, focusing on video generation versus latent space representation [6][7][10]. Group 1: Video Generation vs Latent Space Representation - Google DeepMind's release of Genie 3, which can generate interactive 3D environments from text prompts, has reignited discussions on the effectiveness of pixel-level video prediction versus latent space modeling for world models [6]. - Proponents of video prediction argue that accurately generating high-quality videos indicates a model's understanding of physical and causal laws, while critics suggest that pixel consistency does not equate to causal understanding [10]. - The latent space modeling approach emphasizes abstract representation to avoid unnecessary computational costs associated with pixel-level predictions, focusing instead on learning temporal and causal structures [9]. Group 2: Divergence in Implementation Approaches - There is a clear divide in the industry regarding the implementation of world models, with some experts advocating for pixel-level predictions and others supporting latent space abstraction [8]. - The video prediction route typically involves reconstructing visual content frame by frame, while the latent space approach compresses environmental inputs into lower-dimensional representations for state evolution prediction [9]. - The debate centers on whether to start from pixel-level details and abstract upwards or to model directly in an abstract space, bypassing pixel intricacies [9]. Group 3: Recent Developments and Trends - The article highlights various recent models, including Sora, Veo 3, Runway Gen-3 Alpha, V-JEPA 2, and Genie 3, analyzing their core architectures and technical implementations to explore trends in real-world applications [11].
咪咕等公司取得视频生成相关专利
Sou Hu Cai Jing· 2025-08-12 05:08
Group 1 - The State Intellectual Property Office has granted a patent for "video generation methods, devices, equipment, and computer-readable storage media" to Migu Culture Technology Co., Ltd., China Mobile Communications Group Co., Ltd., and Beijing JD Shangke Information Technology Co., Ltd. The patent authorization announcement number is CN115100338B, with an application date of June 2022 [1][2][3] - Migu Culture Technology Co., Ltd. was established in 2014 and is primarily engaged in software and information technology services. The company has a registered capital of 1,040 million RMB and has invested in 9 companies, participated in 2,550 bidding projects, and holds 982 trademark records and 2,700 patent records [1] - China Mobile Communications Group Co., Ltd. was founded in 1999 and focuses on telecommunications, broadcasting, television, and satellite transmission services. The company has a registered capital of 30,000 million RMB, invested in 55 companies, participated in 5,000 bidding projects, and holds 2,219 trademark records and 5,000 patent records [1] - Beijing JD Shangke Information Technology Co., Ltd. was established in 2012 and is also engaged in software and information technology services. The company has a registered capital of 26 million RMB, invested in 9 companies, participated in 111 bidding projects, and holds 474 trademark records and 5,000 patent records [2]
活动报名:AI 视频的模型、产品与增长实战|42章经
42章经· 2025-08-10 14:04
Core Insights - The article discusses an upcoming online event focused on AI video technology, featuring industry experts sharing their practical experiences and insights on models, products, and growth strategies in the AI video sector [10]. Group 1: Event Overview - The online event will take place on August 16, from 10:30 AM to 12:30 PM, and will be hosted on Tencent Meeting [7][8]. - The event is limited to 100 participants, with a preference for attendees who provide thoughtful responses and have relevant backgrounds [10]. Group 2: Guest Speakers and Topics - Guest speaker Dai Gaole, Lead of Luma AI model products, will discuss the technical paths and future capabilities of video models and world models [2]. - Guest speaker Xie Xuzhang, co-founder of Aishi Technology, will share key decisions that led to Pixverse achieving 60 million users in two years, including the evolution of visual models [3][4]. - Guest speaker Xie Juntao, former growth product lead at OpusClip, will focus on customer acquisition, conversion strategies, user retention, and data-driven decision-making in video creation products [5].
马斯克:接下来的几天里Grok lmagine视频生成对所有美国用户免费
Di Yi Cai Jing· 2025-08-07 08:04
Group 1 - The core point of the article is that Elon Musk announced Grok lmagine video generation will be free for all users in the United States in the coming days [1] Group 2 - The announcement indicates a strategic move to enhance user engagement and expand the user base for Grok lmagine [1] - This initiative may position the company favorably in the competitive landscape of video generation technologies [1] - The decision to offer the service for free could potentially lead to increased adoption rates among users [1]