Workflow
3D Student Splatting and Scooping
icon
Search documents
谢赛宁苏昊CVPR25获奖!华人博士王建元一作拿下最佳论文
量子位· 2025-06-13 16:44
Core Viewpoint - The CVPR 2025 awards have been announced, recognizing outstanding contributions in the field of computer vision, particularly highlighting young scholars and innovative research papers [1][2]. Group 1: Young Scholar Awards - The awards are aimed at early-career researchers who have obtained their PhD within the last seven years, acknowledging their significant contributions to computer vision [2]. - Notable recipients include Su Hao, a PhD student of Fei-Fei Li, who contributed to the renowned ImageNet project [3]. - Xie Saining, recognized for his work on ResNeXt and MAE, has also made impactful contributions to the field [4]. Group 2: Best Paper Award - The Best Paper award was given to "VGGT: Visual Geometry Grounded Transformer," co-authored by researchers from Meta and Oxford University, led by Wang Jianyuan [5]. - VGGT is the first large Transformer model capable of end-to-end predicting complete 3D scene information in a single feedforward pass, outperforming existing geometric and deep learning methods [5]. Group 3: Best Student Paper - The Best Student Paper award went to "Neural Inverse Rendering from Propagating Light," developed by a collaboration between the University of Toronto and Carnegie Mellon University [7]. - This paper introduces a physics-based neural inverse rendering method that reconstructs scene geometry and materials from multi-view, time-resolved light propagation data [9][25]. Group 4: Honorable Mentions - Four papers received Honorable Mentions, including: - "MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos," which presents a system for estimating camera parameters and depth maps from dynamic scenes [10][32]. - "Navigation World Models," which proposes a controllable video generation model for predicting future visual observations based on past actions [10][38]. - "Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models," which introduces a new family of open-source vision-language models [10][45]. - "3D Student Splatting and Scooping," which presents a new 3D model that improves upon existing Gaussian splatting techniques [10][52]. Group 5: Technical Innovations - VGGT employs an alternating attention mechanism to process both frame-wise and global self-attention, allowing for efficient memory usage while integrating multi-frame scene information [13][18]. - The "Neural Inverse Rendering" method utilizes a time-resolved radiance cache to understand light propagation, enhancing scene reconstruction capabilities [25][27]. - The "MegaSaM" system optimizes depth estimation and camera parameter accuracy in dynamic environments, outperforming traditional methods [32][35]. - The "Navigation World Model" adapts to new constraints in navigation tasks, demonstrating flexibility in unfamiliar environments [38][42]. - The "Molmo" model family is built from scratch without relying on closed-source data, enhancing the understanding of high-performance vision-language models [45][46].
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].