Workflow
NeRF
icon
Search documents
三维重建综述:从多视角几何到 NeRF 与 3DGS 的演进
自动驾驶之心· 2025-09-22 23:34
Core Viewpoint - 3D reconstruction is a critical intersection of computer vision and graphics, serving as the digital foundation for cutting-edge applications such as virtual reality, augmented reality, autonomous driving, and digital twins. Recent advancements in new perspective synthesis technologies, represented by Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved reconstruction quality, speed, and dynamic adaptability [5][6]. Group 1: Introduction and Demand - The resurgence of interest in 3D reconstruction is driven by new application demands across various fields, including city-scale digital twins requiring kilometer-level coverage and centimeter-level accuracy, autonomous driving simulations needing dynamic traffic flow and real-time semantics, and AR/VR social applications demanding over 90 FPS and photo-realistic quality [6]. - Traditional reconstruction pipelines are inadequate for these new requirements, prompting the integration of geometry, texture, and lighting through differentiable rendering techniques [6]. Group 2: Traditional Multi-View Geometry Reconstruction - The traditional multi-view geometry approach (SfM to MVS) has inherent limitations in quality, efficiency, and adaptability to dynamic scenes, which have been addressed through iterative advancements in NeRF and 3DGS technologies [7]. - A comprehensive comparison of various methods highlights the evolution and future challenges in the field of 3D reconstruction [7]. Group 3: NeRF and Its Innovations - NeRF models scenes as continuous 5D functions, enabling advanced rendering techniques that have evolved significantly from 2020 to 2024, addressing issues such as data requirements, texture limitations, lighting sensitivity, and dynamic scene handling [13][15]. - Various methods have been developed to enhance quality and efficiency, including Mip-NeRF, NeRF-W, and InstantNGP, each contributing to improved rendering speeds and reduced memory usage [17][18]. Group 4: 3DGS and Its Advancements - 3DGS represents scenes as collections of 3D Gaussians, allowing for efficient rendering and high-quality output. Recent methods have focused on optimizing rendering quality and efficiency, achieving significant improvements in memory usage and frame rates [22][26]. - The comparison of 3DGS with other methods shows its superiority in rendering speed and dynamic scene reconstruction capabilities [31]. Group 5: Future Trends and Conclusion - The next five years are expected to see advancements in hybrid representations, real-time processing on mobile devices, generative reconstruction techniques, and multi-modal fusion for robust reconstruction [33]. - The ultimate goal is to enable real-time 3D reconstruction accessible to everyone, marking a shift towards ubiquitous computing [34].
肝了几个月,新的端到端闭环仿真系统终于用上了。
自动驾驶之心· 2025-07-03 12:41
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm for dynamic scene representation in autonomous driving, highlighting its efficiency in training and rendering compared to previous methods [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - Street Gaussians aims to generate realistic images for view synthesis in dynamic urban street scenes by modeling them as a combination of foreground moving vehicles and static backgrounds [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to represent geometry and color, with parameters optimized to avoid invalid values [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. - A 4D spherical harmonic model is introduced to encode temporal information into the appearance of moving vehicles without high storage costs [12]. Group 3: Initialization and Data Handling - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. - For objects with fewer than 2,000 LiDAR points, random sampling is employed to ensure sufficient data for model initialization [17]. Group 4: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][35].
4万多名作者挤破头,CVPR 2025官方揭秘三大爆款主题, 你卷对方向了吗?
机器之心· 2025-05-28 03:02
Core Insights - The article discusses the latest trends in the field of computer vision, highlighting three major research directions that are gaining traction as of 2025 [3][4]. Group 1: Major Research Directions - The three prominent areas identified are: 1. Multi-view and sensor 3D technology, which has evolved from 2D rendering to more complex 3D evaluations, significantly influenced by the introduction of NeRF in 2020 [5]. 2. Image and video synthesis, which has become a focal point for presenting environmental information more accurately, reflecting advancements in the ability to analyze and generate multimedia content [6]. 3. Multimodal learning, which integrates visual, linguistic, and reasoning capabilities, indicating a trend towards more interactive and comprehensive AI systems [7][8]. Group 2: Conference Insights - The CVPR 2025 conference has seen a 13% increase in paper submissions, with a total of 13,008 submissions and an acceptance rate of 22.1%, indicating a highly competitive environment [3]. - The conference emphasizes the importance of diverse voices in the research community, ensuring that every paper, regardless of the author's affiliation, is given equal consideration [8].
虞晶怡教授:大模型的潜力在空间智能,但我们对此还远没有共识|Al&Society百人百问
腾讯研究院· 2025-05-09 08:20
Core Viewpoint - The article discusses the transformative impact of generative AI on technology, business, and society, emphasizing the shift from an information society to an intelligent society, and the need to explore new opportunities and challenges brought by AI [1]. Group 1: Insights from Experts - The article features insights from Yu Jingyi, a prominent professor in computer science, who highlights the current bottlenecks in large model technology and the potential of generative AI in spatial intelligence [5][6]. - Yu emphasizes that the understanding of spatial intelligence is evolving, moving from simple digital reconstructions to more complex intelligent interpretations of space, aided by advancements in generative AI [12][13]. Group 2: Technological Breakthroughs - The development of generative AI technologies, such as DALL-E 3 and GPT-4o, showcases the potential for significant advancements in image and video generation, indicating that the capabilities of language models in visual generation are far from being fully realized [10][11]. - The introduction of the CAST project, which incorporates actor-network theory and physical rules, aims to enhance the understanding of spatial relationships among objects, marking a significant step in the evolution of spatial intelligence [16][18]. Group 3: Challenges and Opportunities - A major challenge in the field is the lack of sufficient 3D scene data, particularly real-world data, which hampers the development of robust AI models for spatial understanding [18][19]. - The article discusses the potential of cross-modal methods to address data scarcity in 3D environments, leveraging advancements in text-to-image technologies to infer spatial relationships [19][20]. Group 4: Future Applications - The short-term applications of spatial intelligence are expected to be in the fields of art creation, gaming, and film production, where generative AI can significantly enhance efficiency and creativity [42][43]. - In the medium to long term, spatial intelligence is anticipated to become a core component of embodied intelligence, potentially transforming industries such as smart devices and robotics [43][44]. Group 5: Ethical Considerations - The rise of AI companionship raises ethical questions regarding emotional dependency and the implications of human-robot interactions, necessitating ongoing discussions about ethical frameworks in technology development [50][51].