实时帧模型)
Search documents
李飞飞全新「世界模型」问世,单张H100实时生成3D永恒世界
36氪· 2025-10-17 09:47
Core Viewpoint - The article discusses the release of RTFM (Real-Time Frame Model), a highly efficient generative world model developed by World Labs, which can render persistent 3D worlds in real-time using a single H100 GPU [2][4][12]. Group 1: RTFM Features - RTFM operates without explicit 3D representations, generating new 2D images from one or more input images [6][7]. - The model learns to simulate complex physical phenomena like 3D geometry, reflections, and shadows solely from observing training video data [9]. - RTFM is designed around three core principles: efficiency, scalability, and persistence [12][14]. Group 2: Efficiency and Scalability - RTFM can run real-time inference at interactive frame rates with just one H100 GPU, making it a practical solution for current hardware [14][38]. - The model's architecture allows it to scale with increasing data and computational power, avoiding reliance on explicit 3D representations [14][44]. - RTFM is viewed as a "learning renderer," capable of generating new views from 2D images without manual design [46][48]. Group 3: Persistence and Memory - RTFM addresses the challenge of persistence by modeling the pose of each frame in 3D space, allowing for a structured memory of the world [60][64]. - The model employs "context juggling" to maintain geometric persistence in large scenes during long interactions [66][67]. - This approach enables RTFM to generate content in different spatial areas while preserving the context of the generated world [66][67]. Group 4: Future Prospects - RTFM sets a technological roadmap for future world models, emphasizing the potential for real-time deployment on current hardware [69]. - There are exciting directions for expanding RTFM, such as simulating dynamic worlds and enhancing user interaction with generated environments [70]. - The team aims to improve performance with larger models that can operate under higher inference budgets [71].
李飞飞全新「世界模型」问世,单张H100实时生成3D永恒世界
3 6 Ke· 2025-10-17 01:48
Core Insights - The article discusses the launch of RTFM (Real-Time Frame Model), a highly efficient autoregressive diffusion Transformer model capable of real-time rendering of persistent and 3D-consistent worlds using a single H100 GPU [1][5][18]. Group 1: Model Features - RTFM does not create explicit 3D representations but generates new 2D images from one or more input 2D images, functioning as an "AI that has learned to render" [3][15]. - The model learns to simulate complex physical phenomena such as 3D geometry, reflections, and shadows solely from observing training videos [5][24]. - RTFM is designed around three core principles: efficiency, scalability, and persistence [5][31]. Group 2: Efficiency and Scalability - RTFM can operate in real-time with interactive frame rates using only one H100 GPU, making it highly efficient [5][22]. - The model's architecture allows it to scale with increasing data and computational power, learning from large-scale video data without relying on explicit 3D representations [5][23]. - The model is seen as a "learning renderer," converting input frames into neural network activations to implicitly represent the world [23][29]. Group 3: Persistence and Contextual Memory - RTFM addresses the challenge of persistence by modeling the pose (position and orientation) of each frame in 3D space, allowing the world to remain consistent even when the user looks away [31][35]. - The model employs "context juggling" to maintain geometric persistence in large scenes during long interactions, retrieving nearby frames from spatial memory [37][38]. - This approach enables RTFM to generate new frames while preserving the context of the world, enhancing the user experience [37][38]. Group 4: Future Prospects - RTFM sets a technological roadmap for future world models, demonstrating the potential for deployment on current hardware while paving the way for larger models with improved performance [38][39]. - The team envisions expanding RTFM to simulate dynamic worlds and enhance user interaction with the generated environments [38].