3D世界生成模型
Search documents
李飞飞3D世界模型公测,网友已经玩疯了
具身智能之心· 2025-11-14 01:02
Core Insights - The article discusses the launch of a new 3D world generation model called Marble, developed by Fei-Fei Li's World Lab, which allows users to easily create personalized 3D worlds without needing a professional team [3][5][15]. Group 1: Model Features - Marble enables users to generate 3D worlds using simple text prompts, single images, or even short videos, making it accessible to the general public [5][17]. - The model includes built-in AI editing tools that allow users to make both minor and major modifications to their created worlds, such as removing objects or changing visual styles [21][25]. - Users can export their created worlds in two formats: high-fidelity Gaussian point clouds for rendering in browsers and triangle meshes for compatibility with various industry-standard tools [29][40]. Group 2: User Experience - The model has received positive feedback for its ease of use, with users quickly sharing their creations online [8][15]. - Marble supports multi-modal input, allowing for a variety of ways to create and edit 3D environments, which enhances user engagement and creativity [34][35]. Group 3: Future Developments - The team plans to focus on enhancing interactivity in future iterations of Marble, enabling real-time interactions within the created 3D worlds [36][37]. - The article emphasizes that Marble is a significant step towards achieving a "truly spatially intelligent world model," which will incorporate capabilities for dynamic interaction and evolution over time [40].
李飞飞3D世界模型公测,网友已经玩疯了
量子位· 2025-11-13 05:38
Core Insights - The article discusses the launch of a new 3D world generation model called Marble, developed by World Lab, founded by Fei-Fei Li, which is now open for public testing [1][3][34] - Marble allows users to easily create personalized 3D worlds using text, photos, or short videos, significantly lowering the barrier for entry in 3D modeling [4][15][35] Group 1: Features and Functionality - Marble can generate 3D worlds from simple text prompts or single images, and it supports multiple images from different angles to create a cohesive environment [17][35] - Users can customize their 3D spaces by uploading multiple images to define layouts and can edit elements within the generated worlds, such as removing objects or changing styles [19][21] - The platform includes an AI-native world editing tool, allowing for both minor and extensive modifications to the created environments [21][33] Group 2: Export and Compatibility - Users can export their created worlds in two formats: Gaussian point cloud for high fidelity rendering and triangle mesh for compatibility with various industry-standard tools [29] - The generated 3D worlds can also be rendered into videos, which can be enhanced with additional details and dynamic elements [31] Group 3: Future Developments - Marble aims to enhance interactivity in future updates, allowing users to not only create but also interact with elements within their 3D worlds [36][37] - The development team emphasizes that the current features are just the foundation, with plans to incorporate real-time interactions in the generated environments [36][37]
混元3D世界模型1.0 lite版本发布,消费级显卡就能跑
量子位· 2025-08-15 10:05
Core Viewpoint - Tencent's HunyuanWorld 1.0 model enables the generation of immersive 3D worlds from simple text or images, offering high-quality outputs with low operational barriers and compatibility with traditional CG pipelines [5][41]. Technical Framework - The core technology of HunyuanWorld 1.0 utilizes panoramic images as a bridge for layered 3D generation, leveraging the diversity of 2D generation techniques to create rich scenes [9]. - The scene generation process involves three key steps: creating a seamless 360° panoramic image from input text or images, breaking the panoramic image into independent semantic layers, and converting these layers into 3D structures with depth annotations [11][15][16]. Optimization Techniques - The model incorporates two practical optimizations: seamless roaming for long-distance scenes using point cloud caching and video diffusion technology, and dual-mode compression for online/offline storage and inference of 3D models [18]. - Initial versions required over 26GB of VRAM, limiting accessibility for most consumer-grade graphics cards [19]. The introduction of HunyuanWorld 1.0-Lite allows operation on consumer-grade GPUs by optimizing memory usage through dynamic FP8 quantization, reducing VRAM requirements by 35% to below 17GB [20][25]. Performance Enhancements - Dynamic FP8 quantization adjusts the quantization range based on parameter distribution, maintaining model performance while reducing memory usage [26]. - SageAttention quantization technology enhances inference speed by over 2 times with less than 1% precision loss, significantly lowering the memory required for model operation [28][29]. - The integration of a Cache algorithm improves inference efficiency by optimizing redundant time steps, resulting in smoother model operation [33]. Comparative Analysis - HunyuanWorld 1.0 outperforms other open-source 3D models in clarity, inference speed, compatibility with 3D engines, and editability [38]. - It generates editable 3D mesh files rather than videos, making it more versatile compared to competitors like Google's Genie3 [41]. - The model's compatibility with existing CG and 3D production pipelines enhances its practical value, while its open-source nature and single-card deployment capability facilitate easier implementation compared to other models [42].
腾讯正式发布并开源业界首个的3D世界生成模型
news flash· 2025-07-27 01:55
Core Insights - Tencent officially launched and open-sourced the industry's first 3D world generation model, named Hunyuan 3D World Model 1.0, allowing users to create navigable 3D worlds in minutes by inputting a sentence or an image [1] Group 1 - The new model significantly reduces production cycles by enabling the output of standardized 3D assets [1] - Tencent plans to open-source many more models in the future, including edge-side mixed inference large language models and multimodal understanding models [1]