Workflow
Self - Regressive Architecture
icon
Search documents
美国视频生成老炮儿,入局世界模型
量子位· 2025-12-13 04:34
Core Insights - Runway has launched its first general world model GWM-1, which is based on the latest Gen-4.5 video generation model [1][8] - The GWM-1 includes three variants: GWM Worlds, GWM Avatars, and GWM Robotics, each designed for different applications [5][12] Group 1: GWM-1 Overview - GWM-1 utilizes an autoregressive architecture that allows for frame-by-frame prediction based on previous memory content [9] - The model supports real-time interactive control, enabling users to adjust camera angles, modify robot operation commands, or audio [10] Group 2: GWM Worlds - GWM Worlds allows users to explore a coherent and responsive environment without manually designing each space [13] - Users can provide a static scene for reference, and the model will generate an immersive, infinite, and explorable space in real-time [13] - It maintains spatial consistency of scene elements during long sequences of movement, unlike other world models that generate limited frame sequences [13] - Users can change physical rules of the environment through text prompts, facilitating training for agents in real-world actions [15][16] - GWM Worlds can also support VR immersive experiences by generating virtual environments in real-time [17] Group 3: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates human dialogue with realistic facial expressions and gestures [18][19] - It can serve as a personalized tutor or enhance customer service by creating digital humans that can interact naturally [20] - The model is set to launch with an API for integration into various products or services [22] Group 4: GWM Robotics - GWM Robotics functions as a learning-based simulator rather than a fixed-rule programming model, predicting video sequences based on robot data [23] - It generates synthetic training data to enhance existing robot datasets without the need for expensive real-world data collection [24] - The model allows for direct testing of strategy models without deploying them on physical robots, improving safety and efficiency [26] - A Python SDK for GWM Robotics has been released, supporting multi-view video generation and long context sequences for seamless integration into modern robot strategy models [29] Group 5: Gen-4.5 Upgrades - The latest Gen-4.5 update includes native audio generation and editing capabilities, allowing for realistic dialogue, sound effects, and background audio [30][31] - Users can edit existing audio to meet specific needs and utilize multi-shot editing for consistent transformations across video segments [33]