李飞飞全新「世界模型」问世，单张H100实时生成3D永恒世界

Core Viewpoint - The article discusses the release of RTFM (Real-Time Frame Model), a highly efficient generative world model developed by World Labs, which can render persistent 3D worlds in real-time using a single H100 GPU [2][4][12]. Group 1: RTFM Features - RTFM operates without explicit 3D representations, generating new 2D images from one or more input images [6][7]. - The model learns to simulate complex physical phenomena like 3D geometry, reflections, and shadows solely from observing training video data [9]. - RTFM is designed around three core principles: efficiency, scalability, and persistence [12][14]. Group 2: Efficiency and Scalability - RTFM can run real-time inference at interactive frame rates with just one H100 GPU, making it a practical solution for current hardware [14][38]. - The model's architecture allows it to scale with increasing data and computational power, avoiding reliance on explicit 3D representations [14][44]. - RTFM is viewed as a "learning renderer," capable of generating new views from 2D images without manual design [46][48]. Group 3: Persistence and Memory - RTFM addresses the challenge of persistence by modeling the pose of each frame in 3D space, allowing for a structured memory of the world [60][64]. - The model employs "context juggling" to maintain geometric persistence in large scenes during long interactions [66][67]. - This approach enables RTFM to generate content in different spatial areas while preserving the context of the generated world [66][67]. Group 4: Future Prospects - RTFM sets a technological roadmap for future world models, emphasizing the potential for real-time deployment on current hardware [69]. - There are exciting directions for expanding RTFM, such as simulating dynamic worlds and enhancing user interaction with generated environments [70]. - The team aims to improve performance with larger models that can operate under higher inference budgets [71].