三维重建
Search documents
Meta「分割一切」进入3D时代!图像分割结果直出3D,有遮挡也能复原
量子位· 2025-11-20 07:01
Core Viewpoint - Meta's new 3D modeling paradigm allows for direct conversion of image segmentation results into 3D models, enhancing the capabilities of 3D reconstruction from 2D images [1][4][8]. Summary by Sections 3D Reconstruction Models - Meta's MSL lab has released SAM 3D, which includes two models: SAM 3D Objects for object and scene reconstruction, and SAM 3D Body focused on human modeling [4][8]. - SAM 3D Objects can reconstruct 3D models and estimate object poses from a single natural image, overcoming challenges like occlusion and small objects [10][11]. - SAM 3D Objects outperforms existing methods, achieving a win rate at least five times higher than leading models in direct user comparisons [13][14]. Performance Metrics - SAM 3D Objects shows significant performance improvements in 3D shape and scene reconstruction, with metrics such as F1 score of 0.2339 and 3D IoU of 0.4254 [15]. - SAM 3D Body also achieves state-of-the-art (SOTA) results in human modeling, with MPJPE of 61.7 and PCK of 75.4 across various datasets [18]. Semantic Understanding - SAM 3 introduces a concept segmentation feature that allows for flexible object segmentation based on user-defined prompts, overcoming limitations of fixed label sets [21][23]. - The model can identify and segment objects based on textual descriptions or selected examples, significantly enhancing its usability [26][31]. Benchmarking and Results - SAM 3 has set new SOTA in promptable segmentation tasks, achieving an accuracy of 47.0% in zero-shot segmentation on the LVIS dataset, surpassing the previous SOTA of 38.5% [37]. - In the new SA-Co benchmark, SAM 3's performance is at least twice as strong as baseline methods [38]. Technical Architecture - SAM 3's architecture is built on a shared Perception Encoder, which improves consistency and efficiency in feature extraction for both detection and tracking tasks [41][43]. - The model employs a two-stage generative approach for SAM 3D Objects, utilizing a 1.2 billion parameter flow-matching transformer for geometric predictions [49][50]. - SAM 3D Body utilizes a unique Momentum Human Rig representation to decouple skeletal pose from body shape, enhancing detail in human modeling [55][60].
高德如何助力文博业“打破无限”?
21世纪经济报道· 2025-09-30 11:56
Core Viewpoint - The article emphasizes the importance of cultural relic protection and the urgent need for digital transformation in the museum sector to enhance preservation and public engagement [1][3]. Digital Transformation in Cultural Heritage - Recent policies have highlighted the need for accelerating the digitization of collections and improving database accessibility, with a focus on integrating digital technology into cultural heritage management [1][3]. - The digitalization of cultural heritage has seen significant advancements, with many museums adopting 3D online exhibitions and AI-guided tours, making digital experiences a norm [3][4]. Challenges in Digitalization - The cultural heritage sector faces substantial challenges, including the large volume of collections, the fragility of artifacts, and the high costs and lengthy processes associated with traditional 3D modeling [5][6]. - Three main pain points have been identified: limited physical accessibility to popular museums, bottlenecks in large-scale digitization, and ongoing operational pressures to balance artifact protection with visitor capacity [5][6]. Technological Solutions - Companies like Gaode are leveraging their expertise in digital twin technology and AI to address these challenges, aiming to lower costs and improve operational efficiency in cultural heritage management [6][8]. - Gaode's "Cloud Realm" platform enhances the efficiency of 3D reconstruction and digitalization, allowing for a more interactive and engaging public experience with cultural artifacts [8][9]. Innovative Collaborations - Gaode has partnered with institutions like the Palace Museum to establish an AI 3D reconstruction innovation lab, focusing on overcoming data collection challenges in complex environments [8][12]. - The collaboration aims to create a replicable model for digital management and service in various cultural institutions, enhancing the role of intelligent technology in museums [11][12]. Future Directions - The article suggests that Gaode's approach to cultural heritage digitalization could serve as a model for smaller museums, emphasizing the importance of technology in making cultural knowledge more accessible [12][13]. - Gaode's commitment to maintaining a technology-focused platform aims to provide a neutral and serious approach to cultural heritage management, distinguishing it from other content-driven organizations [13][14].
世界机器人大会引爆3D视觉革命,空间智能成焦点~
自动驾驶之心· 2025-08-11 05:45
Core Viewpoint - The 2025 World Robot Conference (WRC) in Beijing highlights 3D perception technology as a key focus, showcasing advancements in spatial memory modules and multi-modal sensors that enhance robotic capabilities in various industries [2][4]. Group 1: 3D Reconstruction Technology - The ultimate goal of 3D reconstruction technology is to enable robots to understand, navigate, and operate in any environment [4]. - The latest handheld laser scanner, D-H100, achieves centimeter-level precision scanning at a distance of 120 meters, significantly improving efficiency by 300% in complex environments [4]. - The integration of laser scanning capabilities with robots can facilitate real-time mapping of disaster areas and enhance operational efficiency in industrial settings [4][5]. Group 2: GeoScan S1 Laser Scanner - GeoScan S1 is presented as the most cost-effective handheld 3D laser scanner in China, featuring a lightweight design and easy one-button operation for efficient 3D solutions [7][12]. - The device supports real-time reconstruction of 3D scenes with centimeter-level accuracy and can cover areas exceeding 200,000 square meters [7][25]. - It integrates multiple sensors and offers high bandwidth connectivity, making it suitable for various research and industrial applications [7][9]. Group 3: Technical Specifications and Features - GeoScan S1 operates on Ubuntu 20.04 and supports various data export formats, including PCD, LAS, and PLY, with relative accuracy better than 3 cm and absolute accuracy better than 5 cm [25][28]. - The scanner features a compact design with dimensions of 14.2 cm x 9.5 cm x 45 cm and weighs 1.3 kg without the battery, providing a battery life of approximately 3 to 4 hours [25][27]. - It includes advanced synchronization technology for multi-sensor data, ensuring precise mapping in complex indoor and outdoor environments [33][34]. Group 4: Market Position and Pricing - The GeoScan S1 is available in multiple versions, with prices starting at 19,800 yuan for the basic model and going up to 67,800 yuan for the offline version [60]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, ensuring reliability and performance [14][18]. - The scanner is designed for cross-platform integration, making it compatible with drones, unmanned vehicles, and humanoid robots for automated operations [45][48].
自动驾驶之心项目与论文辅导来了~
自动驾驶之心· 2025-08-07 12:00
Core Viewpoint - The article announces the launch of the "Heart of Autonomous Driving" project and paper guidance, aimed at assisting students facing challenges in their research and development efforts in the field of autonomous driving [1]. Group 1: Project and Guidance Overview - The project aims to provide support for students who encounter difficulties in their research, such as environmental configuration issues and debugging challenges [1]. - Last year's outcomes were positive, with several students successfully publishing papers in top conferences like CVPR and ICRA [1]. Group 2: Guidance Directions - **Direction 1**: Focus on multi-modal perception and computer vision, end-to-end autonomous driving, large models, and BEV perception. The guiding teacher has published over 30 papers in top AI conferences with a citation count exceeding 6000 [3]. - **Direction 2**: Emphasis on 3D Object Detection, Semantic Segmentation, Occupancy Prediction, and multi-task learning based on images or point clouds. The guiding teacher is a top-tier PhD with multiple publications in ECCV and CVPR [5]. - **Direction 3**: Concentration on end-to-end autonomous driving, OCC, BEV, and world model directions. The guiding teacher is also a top-tier PhD with contributions to several mainstream perception solutions [6]. - **Direction 4**: Focus on NeRF / 3D GS neural rendering and 3D reconstruction. The guiding teacher has published four CCF-A class papers, including two in CVPR and two in IEEE Transactions [7].
再见伪影!港大开源GS-SDF:SDF做高斯初始化还能这么稳~
自动驾驶之心· 2025-07-24 06:46
Core Viewpoint - The article presents a unified LiDAR-visual system that addresses geometric inconsistencies in Gaussian splatting for robotic applications, successfully combining Gaussian splatting with Neural Signed Distance Fields (NSDF) to achieve geometrically consistent rendering and reconstruction [52]. Group 1: Unified LiDAR-Visual System - The proposed system aims to utilize registered images and low-cost LiDAR data to reconstruct both the appearance and surface structure of scenes under arbitrary trajectories [5][6]. - The importance of Gaussian initialization in achieving good structure is emphasized, highlighting its role in the optimization process [22]. Group 2: Geometric Regularization - The article discusses the introduction of geometric regularization into the 3D Gaussian Splatting (3DGS) framework to address geometric inconsistencies that manifest as rendering distortions [3][6]. - It suggests that depth cameras and LiDAR can provide direct structural priors, which can be integrated into the 3DGS framework for improved geometric regularization [3]. Group 3: Methodology - The overall process includes three stages: training a Neural Signed Distance Field (NSDF) using point clouds, initializing Gaussian primitives from the NSDF, and optimizing both Gaussian primitives and NSDF through SDF-assisted shape regularization [8][6]. - The use of 2D Gaussian splatting to represent 3D scenes is detailed, with each disk defined by parameters such as center point, orthogonal tangent vectors, scaling factor, opacity, and view-dependent color [10]. Group 4: Experimental Results - The proposed method demonstrates superior reconstruction accuracy and rendering quality across various trajectories, as evidenced by extensive experiments [52]. - Quantitative results indicate that the method outperforms existing techniques in metrics such as C-L1, F-Score, SSIM, and PSNR across multiple datasets [46][49]. Group 5: Limitations and Future Work - The method exhibits limitations in extrapolating new view synthesis capabilities, suggesting a need for further exploration of advanced neural rendering techniques to address this limitation [53].
放榜了!ICCV 2025最新汇总(自驾/具身/3D视觉/LLM/CV等)
自动驾驶之心· 2025-06-28 13:34
Core Insights - The article discusses the recent ICCV conference, highlighting the excitement around the release of various works related to autonomous driving and the advancements in the field [2]. Group 1: Autonomous Driving Innovations - DriveArena is introduced as a controllable generative simulation platform aimed at enhancing autonomous driving capabilities [4]. - Epona presents an autoregressive diffusion world model specifically designed for autonomous driving applications [4]. - SynthDrive offers a scalable Real2Sim2Real sensor simulation pipeline for high-fidelity asset generation and driving data synthesis [4]. - StableDepth focuses on scene-consistent and scale-invariant monocular depth estimation, which is crucial for improving perception in autonomous vehicles [4]. - CoopTrack explores end-to-end learning for efficient cooperative sequential perception, enhancing the collaborative capabilities of autonomous systems [4]. Group 2: Image and Vision Technologies - CycleVAR repurposes autoregressive models for unsupervised one-step image translation, which can be beneficial for visual recognition tasks in autonomous driving [5]. - CoST emphasizes efficient collaborative perception from a unified spatiotemporal perspective, which is essential for real-time decision-making in autonomous vehicles [5]. - Hi3DGen generates high-fidelity 3D geometry from images via normal bridging, improving the spatial understanding of environments for autonomous systems [5]. - GS-Occ3D focuses on scaling vision-only occupancy reconstruction for autonomous driving using Gaussian splatting techniques [5]. Group 3: Large Model Applications - ETA introduces a dual approach to self-driving with large models, enhancing the efficiency and effectiveness of autonomous driving systems [5]. - Taming the Untamed discusses graph-based knowledge retrieval and reasoning for multi-layered large models (MLLMs), which can significantly improve the decision-making processes in autonomous driving [7].
新疆兵团八师:科技创新引领“戈壁明珠”新飞跃
Zhong Guo Xin Wen Wang· 2025-06-03 04:12
Core Viewpoint - Shihezi City is emerging as a national-level innovation hub, leveraging its strategic platforms and rich resources to drive technological advancements and economic growth in Xinjiang [1][2][3] Group 1: Innovation and Development - Shihezi City is recognized as a national-level innovative pilot city, hosting significant strategic platforms such as the Wuchang-Shi National Independent Innovation Demonstration Zone and the Silk Road Economic Belt Innovation-Driven Development Experimental Zone [1] - The city has established itself as a gathering area for agricultural technology, high-tech industries, and emerging industries, with 18 research institutions and numerous national and provincial innovation platforms [1][2] - Shihezi has 113 high-tech enterprises and 174 technology-based SMEs, showcasing continuous innovation vitality [2] Group 2: Achievements and Recognition - The city has received multiple awards for its technological advancements, including six second-class National Science and Technology Progress Awards and over 100 provincial awards [2] - Shihezi High-tech Zone has been selected as a pilot city for "Science and Technology Innovation China" and has received accolades as a national advanced county (city) for scientific progress [2] Group 3: Green Energy and Agriculture - The "Photovoltaic + Sand Control and Ecological Agriculture" project by China New Energy Group integrates ecological and economic benefits, expected to generate approximately 150 million kWh of green electricity annually and reduce carbon emissions by about 78,000 tons [2] - The Xinjiang Tianye Group's modern water-saving agricultural demonstration base utilizes smart technology and data platforms to enhance agricultural efficiency, addressing challenges in saline-alkali land cultivation [3] Group 4: Policy Support and Talent Development - Shihezi City has implemented supportive policies for technological innovation, including a maximum reward of 5 million yuan for technology transformation projects and a 1 billion yuan industrial fund to support leading enterprises [3] - The city is enhancing talent recruitment efforts by establishing special funds and providing green channels for education and medical services for innovative talents [3]