4D世界模型
Search documents
CreateAI联合中科院自动化所推出NeoVerse 4D世界模型
Zhong Guo Jing Ying Bao· 2026-01-06 06:44
"NeoVerse是我们与中国科学院自动化所产学研协同的重要成果,更是用技术解决行业痛点的又一次实践。"CreateAI首席科学家王峰表示,"大模型训练依赖 于可扩展的(scalable)海量数据,而多目或4D数据过于昂贵导致世界模型的发展速度受限。我们提出的Feedforward 4D Gaussian重建加Diffusion生成的方 法,通过单目视频即可高效地合成大量的4D数据,为4D世界模型的建立铺平了道路。" 此次发布的NeoVerse模型支持强大的蒸馏LoRAs(distillation LoRAs)扩展,在单张显卡上可实现低于30秒的快速推理,具备优异的产业级应用潜力。 (文章来源:中国经营报) CreateAI(OTC:TSPH)近日正式发布与中国科学院自动化所共同研发的4D世界模型NeoVerse。目前,相关研究论文已在项目主页上线,供全球开发者查 阅。 据CreateAI方面介绍,该模型融合diffusion与4DGS核心技术,基于100万段开放场景(in-the-wild)单目视频训练,30秒即可完成通用4D世界模型构建,不仅 打破传统4D建模对昂贵多视角数据的依赖,更实现"重建+生成"无 ...
一张图,开启四维时空:4DNeX让动态世界 「活」起来
机器之心· 2025-08-18 03:22
Core Viewpoint - The article introduces 4DNeX, a groundbreaking framework developed by Nanyang Technological University S-Lab and Shanghai Artificial Intelligence Laboratory, which can generate 4D dynamic scenes from a single input image, marking a significant advancement in the field of AI and world modeling [2][3]. Group 1: Research Background - The concept of world models is gaining traction in AI research, with Google DeepMind's Genie 3 capable of generating interactive videos from high-quality game data, but lacking validation in real-world scenarios [5]. - A pivotal point in the development of world models is the ability to accurately depict dynamic 3D environments that adhere to physical laws, enabling realistic content generation and supporting "counterfactual" reasoning [5][6]. Group 2: 4DNeX-10M Dataset - The 4DNeX-10M dataset consists of nearly 10 million frames of 4D annotated video, covering diverse themes such as indoor and outdoor environments, natural landscapes, and human motion, with a focus on "human-centered" 4D data [10]. - The dataset is constructed using a fully automated data-labeling pipeline, which includes data sourcing from public video libraries and quality control measures to ensure high fidelity [12][14]. Group 3: 4DNeX Method Architecture - 4DNeX proposes a 6D unified representation that captures both appearance (RGB) and geometry (XYZ), allowing for the simultaneous generation of multi-modal content without explicit camera control [16]. - The framework employs a key strategy called "width fusion," which minimizes cross-modal distance by directly concatenating RGB and XYZ data, outperforming other fusion methods [18][20]. Group 4: Experimental Results - Experimental results demonstrate that 4DNeX achieves significant breakthroughs in both efficiency and quality, with a dynamic range of 100% and temporal consistency of 96.8%, surpassing existing methods like Free4D [23]. - User studies indicate that 85% of participants preferred the generated effects of 4DNeX, particularly noting its advantages in motion range and realism [23][25]. - Ablation studies confirmed the critical role of the width fusion strategy in optimizing multi-modal integration, eliminating noise and alignment issues present in other approaches [28].