对标Genie 3，蚂蚁灵波开源世界模型LingBot-World

Core Insights - Ant Group's LingBo Technology has released the LingBot-World model, which aims to provide a high-fidelity, high-dynamic, and real-time controllable "digital rehearsal space" for embodied intelligence, autonomous driving, and game development [1] Group 1: Model Capabilities - LingBot-World addresses the common "long-term drift" issue in video generation, achieving nearly 10 minutes of continuous stable lossless generation through multi-stage training and parallel acceleration [1] - The model can generate approximately 16 frames per second (FPS) and maintains an end-to-end interaction latency of under 1 second, allowing users to control characters and camera perspectives in real-time [1] Group 2: Interaction and Flexibility - Users can trigger environmental changes and world events through text commands, such as adjusting weather or changing visual styles, while maintaining relative consistency in scene geometry [1] - The model possesses zero-shot generalization capabilities, enabling it to generate interactive video streams from a single real photo or game screenshot without additional training or data collection [2] Group 3: Data Acquisition Strategy - To address the scarcity of high-quality interactive data for world model training, LingBot-World employs a hybrid data collection strategy, utilizing large-scale cleaned network videos and game captures combined with Unreal Engine (UE) synthesis pipelines [2] - This approach allows for the extraction of clean visuals without UI interference, while synchronously recording operational commands and camera poses to provide precise training signals for the model [2] Group 4: Community Engagement - The model weights and inference code for LingBot-World have been made available to the community, promoting collaboration and further development [3]