Workflow
谢赛宁苏昊CVPR25获奖!华人博士王建元一作拿下最佳论文
量子位·2025-06-13 16:44

Core Viewpoint - The CVPR 2025 awards have been announced, recognizing outstanding contributions in the field of computer vision, particularly highlighting young scholars and innovative research papers [1][2]. Group 1: Young Scholar Awards - The awards are aimed at early-career researchers who have obtained their PhD within the last seven years, acknowledging their significant contributions to computer vision [2]. - Notable recipients include Su Hao, a PhD student of Fei-Fei Li, who contributed to the renowned ImageNet project [3]. - Xie Saining, recognized for his work on ResNeXt and MAE, has also made impactful contributions to the field [4]. Group 2: Best Paper Award - The Best Paper award was given to "VGGT: Visual Geometry Grounded Transformer," co-authored by researchers from Meta and Oxford University, led by Wang Jianyuan [5]. - VGGT is the first large Transformer model capable of end-to-end predicting complete 3D scene information in a single feedforward pass, outperforming existing geometric and deep learning methods [5]. Group 3: Best Student Paper - The Best Student Paper award went to "Neural Inverse Rendering from Propagating Light," developed by a collaboration between the University of Toronto and Carnegie Mellon University [7]. - This paper introduces a physics-based neural inverse rendering method that reconstructs scene geometry and materials from multi-view, time-resolved light propagation data [9][25]. Group 4: Honorable Mentions - Four papers received Honorable Mentions, including: - "MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos," which presents a system for estimating camera parameters and depth maps from dynamic scenes [10][32]. - "Navigation World Models," which proposes a controllable video generation model for predicting future visual observations based on past actions [10][38]. - "Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models," which introduces a new family of open-source vision-language models [10][45]. - "3D Student Splatting and Scooping," which presents a new 3D model that improves upon existing Gaussian splatting techniques [10][52]. Group 5: Technical Innovations - VGGT employs an alternating attention mechanism to process both frame-wise and global self-attention, allowing for efficient memory usage while integrating multi-frame scene information [13][18]. - The "Neural Inverse Rendering" method utilizes a time-resolved radiance cache to understand light propagation, enhancing scene reconstruction capabilities [25][27]. - The "MegaSaM" system optimizes depth estimation and camera parameter accuracy in dynamic environments, outperforming traditional methods [32][35]. - The "Navigation World Model" adapts to new constraints in navigation tasks, demonstrating flexibility in unfamiliar environments [38][42]. - The "Molmo" model family is built from scratch without relying on closed-source data, enhancing the understanding of high-performance vision-language models [45][46].