Workflow
3D Gaussian Splatting (3DGS)
icon
Search documents
挑战WorldLabs:Visionary,一个全面超越Marble底层渲染器的WebGPU渲染平台
机器之心· 2025-12-21 04:21
Core Insights - The article discusses the development of Visionary, a new rendering platform that utilizes WebGPU and ONNX to enhance the visualization and interaction of World Models in web environments, overcoming limitations faced by previous technologies like SparkJS [2][10][27]. Group 1: Challenges in Current Technologies - The existing World Model visualization methods, particularly those relying on WebGL, face significant limitations in rendering dynamic and complex scenes due to CPU sorting bottlenecks [6][7][8]. - Current solutions like SparkJS are primarily designed for static or pre-computed Gaussian rendering, making them inadequate for real-time inference of dynamic 3D Gaussian Splatting (3DGS) and Neural Avatars [7][8]. Group 2: Visionary's Innovations - Visionary is positioned as a native web rendering substrate that integrates GPU computation and rendering directly into browsers, replacing the older WebGL framework [10][25]. - It introduces a Gaussian Generator Contract that standardizes the output of various 3DGS and 4DGS methods into ONNX format, allowing for dynamic generation and updating of Gaussian attributes in real-time [11][13]. Group 3: Performance and Quality Improvements - Experimental data indicates that Visionary significantly outperforms SparkJS in rendering efficiency, particularly in scenes with millions of Gaussian points, by shifting sorting and preprocessing tasks to the GPU [18][21]. - Visionary employs frame-by-frame GPU global sorting to eliminate visual artifacts seen in other solutions, ensuring accurate rendering of transparency even in complex multi-model scenarios [21][24]. Group 4: Applications and Future Directions - Visionary serves as a unified platform for researchers, creators, and industries, enabling quick reproduction and comparison of 3DGS variants, as well as facilitating editing and rendering directly in the browser [24][25]. - The development team views Visionary as a foundational step towards a comprehensive World Model framework, with future explorations planned in areas such as physical interaction enhancement and spatial intelligence [26][28].
将3DGS嵌入Diffusion - 高速高分辨3D生成框架(ICCV'25)
自动驾驶之心· 2025-11-01 16:04
Core Viewpoint - The article introduces a novel pixel-level 3D diffusion model called DiffusionGS for the Image-to-3D generation task, which maintains 3D view consistency and can be applied to both object-centric and larger-scale scene-level generation [2][17]. Group 1: Methodology - DiffusionGS predicts a 3D Gaussian point cloud at each timestep to ensure consistency in generated views, enhancing the quality of both object and scene generation [2][30]. - The model operates in pixel space rather than latent space, allowing for better preservation of 3D representations and higher spatial resolution [26][30]. - A scene-object mixed training strategy is proposed to generalize 3D priors from various datasets, improving the model's performance [32][34]. Group 2: Performance Metrics - DiffusionGS achieves a PSNR of 25.89 and an SSIM of 0.8880, outperforming current state-of-the-art methods by 2.20 dB in PSNR and 23.25 in FID scores [40]. - The model generates images in 6 seconds for 256x256 resolution and 24 seconds for 512x512 resolution, which is 7.5 times faster than Hunyuan-v2.5 [16][40]. - The method demonstrates superior clarity and 3D consistency in generated images, with fewer artifacts and blurriness compared to existing techniques [44]. Group 3: Technical Contributions - The introduction of the Reference-Point Plucker Coordinate (RPPC) enhances spatial perception by incorporating camera pose information into the model [32][37]. - The model's architecture includes two different MLPs for Gaussian primitives decoding, tailored for object-level and scene-level generation [39]. - A point distribution loss is designed to improve object-level training, ensuring better convergence and performance [39].
ICCV 2025自动驾驶场景重建工作汇总!这个方向大有可为~
自动驾驶之心· 2025-07-29 00:52
Core Viewpoint - The article emphasizes the advancements in autonomous driving scene reconstruction, highlighting the integration of various technologies and the collaboration among top universities and research institutions in this field [2][12]. Summary by Sections Section 1: Overview of Autonomous Driving Scene Reconstruction - The article discusses the importance of dynamic and static scene reconstruction in autonomous driving, focusing on the need for precise color and geometric information through the integration of lidar and visual data [2]. Section 2: Research Contributions - Several notable research works from prestigious institutions such as Tsinghua University, Nankai University, Fudan University, and the University of Illinois Urbana-Champaign are mentioned, showcasing their contributions to the field [5][6][10][11]. Section 3: Educational Initiatives - The article promotes a comprehensive course on 3D Gaussian Splatting (3DGS), designed in collaboration with leading experts, aimed at providing in-depth knowledge and practical skills in autonomous driving scene reconstruction [15][19]. Section 4: Course Structure - The course is structured into eight chapters, covering foundational algorithms, technical details of 3DGS, static and dynamic scene reconstruction, surface reconstruction, and practical applications in autonomous driving [19][21][23][25][27][29][31][33]. Section 5: Target Audience - The course is targeted at researchers, students, and professionals interested in 3D reconstruction, requiring a foundational understanding of 3DGS and related technologies [36][37].
多样化大规模数据集!SceneSplat++:首个基于3DGS的综合基准~
自动驾驶之心· 2025-06-20 14:06
Core Insights - The article introduces SceneSplat-Bench, a comprehensive benchmark for evaluating visual-language scene understanding methods based on 3D Gaussian Splatting (3DGS) [11][30]. - It presents SceneSplat-49K, a large-scale dataset containing approximately 49,000 raw scenes and 46,000 filtered 3DGS scenes, which is the most extensive open-source dataset for complex and high-quality scene-level 3DGS reconstruction [9][30]. - The evaluation indicates that generalizable methods consistently outperform per-scene optimization methods, establishing a new paradigm for scalable scene understanding through pre-trained models [30]. Evaluation Protocols - The benchmark evaluates methods based on two key metrics in 3D space: foreground mean Intersection over Union (f-mIoU) and foreground mean accuracy (f-mAcc), addressing object size imbalance and reducing viewpoint dependency compared to 2D evaluations [22][30]. - The evaluation dataset includes ScanNet, ScanNet++, and Matterport3D for indoor scenes, and HoliCity for outdoor scenes, emphasizing the methods' capabilities across various object scales and complex environments [22][30]. Dataset Contributions - SceneSplat-49K is compiled from multiple sources, including SceneSplat-7K, DL3DV-10K, HoliCity, and Aria Synthetic Environments, ensuring a diverse range of indoor and outdoor environments [9][10]. - The dataset preparation involved approximately 891 GPU days and extensive human effort, highlighting the significant resources invested in creating a high-quality dataset [7][9]. Methodological Insights - The article categorizes methods into three types: per-scene optimization methods, per-scene optimization-free methods, and generalizable methods, with SceneSplat representing the latter [23][30]. - Generalizable methods eliminate the need for extensive single-scene computations during inference, allowing for efficient processing of 3D scenes in a single forward pass [24][30]. Performance Results - The results from SceneSplat-Bench demonstrate that SceneSplat excels in both performance and efficiency, often surpassing the pseudo-label methods used for its pre-training [24][30]. - The performance of various methods shows significant variation based on the dataset's complexity, indicating the importance of challenging benchmarks in revealing the limitations of competing methods [28][30].