Workflow
Navigation World Models
icon
Search documents
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].
转身世界就变样?WorldMem用记忆让AI生成的世界拥有了一致性
机器之心· 2025-05-11 03:20
Core Insights - The article discusses the innovative world generation model called WorldMem, which addresses the long-term consistency issue in interactive world generation using a memory mechanism [1][8][38] Group 1: Research Background - Recent advancements in world generation models have been made by companies like Google, Alibaba, and Meta, but the long-term consistency problem remains unresolved [5] - Traditional methods often lead to significant changes in scene content when revisiting, highlighting the need for improved consistency [7][26] Group 2: Methodology - WorldMem introduces a memory mechanism that enhances long-term consistency in world generation, allowing agents to explore diverse scenes while maintaining geometric coherence [11][18] - The model consists of three core modules: conditional generation, memory read/write, and memory fusion [15] - The memory bank stores key historical information, while a greedy matching algorithm efficiently retrieves relevant historical frames to enhance generation quality [18][20] Group 3: Experimental Results - In experiments on the Minecraft dataset, WorldMem outperformed traditional methods in both short-term and long-term generation consistency, achieving a PSNR of 27.01 within the context window and 25.32 beyond it [24][26] - The model demonstrated superior long-term modeling capabilities, maintaining stability and consistency even after generating over 300 frames [27] Group 4: Applications and Future Outlook - WorldMem supports interactive world generation, allowing users to place objects that influence future scenes, showcasing its dynamic modeling capabilities [31] - The article emphasizes the potential of interactive video generation models in virtual simulation and intelligent interaction, positioning WorldMem as a key step towards building realistic, persistent virtual worlds [38]