蚂蚁开源世界模型叫板谷歌Genie3,一张图生成10分钟稳定长视频
Sou Hu Cai Jing·2026-01-31 19:37

Core Viewpoint - Ant Group's LingBo Technology has released and open-sourced the LingBot-World model, designed as an interactive world model framework that provides high-fidelity, controllable, and logically consistent simulation environments [1]. Group 1: Model Capabilities - LingBot-World is driven by a scalable data engine that learns physical laws and causal relationships from large-scale gaming environments, enabling real-time interaction with generated worlds [2]. - The model approaches Google's Genie 3 in key metrics such as video quality, dynamic range, long-term consistency, and interactivity [2]. - It can generate stable outputs for nearly 10 minutes without loss, addressing common issues like "long-term drift" in video generation [3]. Group 2: Interaction and Training - LingBot-World achieves approximately 16 FPS in generation throughput and maintains end-to-end interaction latency under 1 second, allowing real-time control via keyboard or mouse [3]. - Users can trigger environmental changes and world events through text commands while maintaining stable geometric relationships in the scene [4]. - The model employs a hybrid data collection strategy, utilizing cleaned large-scale online videos and game captures to provide diverse scene coverage and aligned training signals for learning "how actions change the environment" [4]. Group 3: Generalization and Application - LingBot-World demonstrates strong zero-shot generalization capabilities, allowing it to generate interactive video streams from a single real-world image or game screenshot without additional training [4]. - The model supports diverse scene generation, enhancing the generalization ability of embodied intelligence algorithms in real-world scenarios [5]. - Ant Group's release of the LingBot-World model marks a significant step in its AGI strategy, bridging the gap between generative AI and embodied intelligence [5].

蚂蚁开源世界模型叫板谷歌Genie3,一张图生成10分钟稳定长视频 - Reportify