Workflow
GWM Worlds
icon
Search documents
美国视频生成老炮儿,入局世界模型
量子位· 2025-12-13 04:34
Core Insights - Runway has launched its first general world model GWM-1, which is based on the latest Gen-4.5 video generation model [1][8] - The GWM-1 includes three variants: GWM Worlds, GWM Avatars, and GWM Robotics, each designed for different applications [5][12] Group 1: GWM-1 Overview - GWM-1 utilizes an autoregressive architecture that allows for frame-by-frame prediction based on previous memory content [9] - The model supports real-time interactive control, enabling users to adjust camera angles, modify robot operation commands, or audio [10] Group 2: GWM Worlds - GWM Worlds allows users to explore a coherent and responsive environment without manually designing each space [13] - Users can provide a static scene for reference, and the model will generate an immersive, infinite, and explorable space in real-time [13] - It maintains spatial consistency of scene elements during long sequences of movement, unlike other world models that generate limited frame sequences [13] - Users can change physical rules of the environment through text prompts, facilitating training for agents in real-world actions [15][16] - GWM Worlds can also support VR immersive experiences by generating virtual environments in real-time [17] Group 3: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates human dialogue with realistic facial expressions and gestures [18][19] - It can serve as a personalized tutor or enhance customer service by creating digital humans that can interact naturally [20] - The model is set to launch with an API for integration into various products or services [22] Group 4: GWM Robotics - GWM Robotics functions as a learning-based simulator rather than a fixed-rule programming model, predicting video sequences based on robot data [23] - It generates synthetic training data to enhance existing robot datasets without the need for expensive real-world data collection [24] - The model allows for direct testing of strategy models without deploying them on physical robots, improving safety and efficiency [26] - A Python SDK for GWM Robotics has been released, supporting multi-view video generation and long context sequences for seamless integration into modern robot strategy models [29] Group 5: Gen-4.5 Upgrades - The latest Gen-4.5 update includes native audio generation and editing capabilities, allowing for realistic dialogue, sound effects, and background audio [30][31] - Users can edit existing audio to meet specific needs and utilize multi-shot editing for consistent transformations across video segments [33]
2026 将近,世界模型到底更「世界」了吗?
机器之心· 2025-12-13 02:30
Core Viewpoint - The recent launch of GWM Worlds and GWM Robotics by Runway pushes video generation towards an interactive "world simulation" paradigm, reigniting discussions on the definition and scope of "world models" as interfaces for creation and interaction, simulators for training and evaluation, or cognitive frameworks for reasoning and decision-making [1]. Group 1: Evolution of World Models - Over the past two years, world models have evolved to be considered on par with LLMs in the AGI landscape, transitioning from a narrow definition focused on reinforcement learning to a broader understanding that includes generative modeling [4]. - Initially, world models were seen as internal environment models for agents, predicting future states based on current conditions and actions, allowing for internal simulation and decision-making [5]. - The engineering perspective defined world models as a combination of three capabilities: compressing high-dimensional perception into usable representations, predicting future states over time, and utilizing predictions for planning and decision-making [6]. - By 2024, the understanding of world models expanded to encompass general world evolution modeling, with a trend from language generation to image generation, and ultimately to 3D and world generation [6]. - The boundaries of the world model concept have become more ambiguous, with ongoing debates about the nature of representations, the incorporation of physical laws, and the organization of input relationships [6]. Group 2: Industry Layout and Trends - Major companies are investing in world models, questioning whether they are enhancing their "data engines" or building new frameworks for "spatiotemporal cognition" [3]. - In February 2024, OpenAI referred to the video generation model Sora as "world simulators," emphasizing their ability to learn the three-dimensional structure and physical laws of the real world [6]. - Concurrently, LeCun introduced V-JEPA, which focuses on predicting masked video segments in abstract representation space, allowing for higher training efficiency by discarding unpredictable information [6]. - The current discourse has shifted from whether to develop world models to how to model them, with debates on whether to abstract from pixel levels or to directly operate in abstract spaces [7]. - There is a recognition that existing approaches may only capture partial physical laws, indicating a need for representations of isolated objects and a priori laws of change across space and time to achieve a coherent world model [7]. Group 3: Definition and Ambiguity of World Models - By 2025, world models are positioned alongside LLMs, with companies like Google DeepMind, Meta, and Nvidia shifting focus from pure LLMs to world models, aiming for "Physical AI + superintelligence" due to stagnation in LLM advancements [8]. - The distinction between world models and existing generative AI lies in the former's goal to construct internal representations of environments that include physical, temporal, and spatial dimensions for planning and decision-making [9]. - The term "world model" has become ambiguous, referring to latent states within systems, game-like simulators for training agents, or any content pipeline capable of generating navigable 3D scenes [9]. - An analysis from Entropy Town in November 2025 categorized world models into three technical routes: interface, simulator, and cognitive framework, highlighting the ongoing ambiguity in the field [9].
Runway深夜炸场:一口气发布5大更新,首个通用世界模型来了
机器之心· 2025-12-12 04:31
Core Insights - Runway has made significant announcements, introducing five major updates that showcase its ambition in AI video and multimedia generation technology [1][3] - The updates indicate a shift from merely generating videos to simulating the physical world, marking a critical transition in the industry [4][34] Group 1: Gen-4.5 Video Generation Model - Gen-4.5 is the latest flagship video generation model, featuring impressive image quality and introducing native audio generation and editing capabilities [6][9] - The model achieves high physical accuracy and visual precision, with realistic movement of objects and fluid dynamics [9][10] - Gen-4.5 supports multi-shot editing, allowing users to modify initial scenes and apply changes throughout the entire video [14][15] - Despite its advancements, Runway acknowledges that Gen-4.5 still has common limitations found in video models, which are crucial for their world model research [15] Group 2: General World Model (GWM-1) - GWM-1 is Runway's first general world model, built on Gen-4.5, utilizing autoregressive methods for frame-by-frame predictions [18][19] - The model allows user intervention based on application scenarios, simulating future events in real-time [19] - GWM-1 includes three variants: GWM Worlds for environment simulation, GWM Avatars for interactive video generation, and GWM Robotics for training robots with synthetic data [21][22] Group 3: GWM Worlds - GWM Worlds enables real-time environment simulation, creating immersive and explorable spaces based on static scenes [23][24] - The model maintains spatial consistency during exploration, allowing for accurate responses to user-defined physical rules [24][25] Group 4: GWM Robotics - GWM Robotics supports counterfactual generation, exploring different robotic trajectories and outcomes [26][27] - It includes a Python SDK for generating videos based on robotic actions, enhancing training data without the need for expensive real-world data collection [28] Group 5: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates natural human movements and expressions [29][30] - The model has broad application potential, including personalized tutoring, customer support, training simulations, and interactive entertainment [31][32] Conclusion - Runway's updates signify a pivotal moment in the industry, transitioning from video generation to true world simulation, indicating a deeper understanding of the physical world's underlying logic [34][35]