华为世界模型来了，单卡30分钟生成272㎡场景

Core Insights - Huawei, in collaboration with Shanghai Jiao Tong University and Huazhong University of Science and Technology, has launched a world model called WordGrow, capable of generating large indoor scenes of up to 1800 square meters in size [1][12]. Group 1: Technology Overview - WordGrow can generate a 272 square meter indoor scene in just 30 minutes using a single A100 GPU, achieving a speed six times faster than similar technologies [11]. - The model employs three core technologies: precise data preprocessing, a 3D block completion mechanism, and a coarse-to-fine generation strategy, which collectively enhance the quality and coherence of generated scenes [9][10]. Group 2: Performance Metrics - Experimental data indicates that WordGrow achieves state-of-the-art (SOTA) geometric reconstruction metrics, with a low Fréchet Inception Distance (FID) of 7.52, significantly outperforming mainstream methods like SynCity and BlockFusion [12]. - Even when expanded to a 7x7 block ultra-large scene, the edge quality remains stable, demonstrating the robustness of the model [10]. Group 3: Team and Research Background - The primary authors of the research are Sikuang Li and Chen Yang from Shanghai Jiao Tong University, who completed the study during their internship at Huawei. The research focuses on computer vision and computer graphics [13].