Context as Memory

Search documents
上下文记忆力媲美Genie3,且问世更早:港大和可灵提出场景一致的交互式视频世界模型
机器之心· 2025-08-21 01:03
Core Insights - The article discusses the development of video generation models that can maintain scene consistency over long durations, addressing the critical issue of stable scene memory in interactive long video generation [2][10][17] - Google DeepMind's Genie 3 is highlighted as a significant advancement in this field, demonstrating strong scene consistency, although technical details remain undisclosed [2][10] - The Context as Memory paper from a research team at Hong Kong University and Kuaishou is presented as a leading academic work that closely aligns with Genie 3's principles, emphasizing implicit learning of 3D priors from video data without explicit 3D modeling [2][10][17] Context as Memory Methodology - The Context as Memory approach utilizes historical generated context as memory, enabling scene-consistent long video generation without the need for explicit 3D modeling [10][17] - A Memory Retrieval mechanism is introduced to efficiently utilize theoretically infinite historical frame sequences by selecting relevant frames based on camera trajectory and field of view (FOV), significantly improving computational efficiency and reducing training costs [3][10][12] Experimental Results - Experimental comparisons show that Context as Memory outperforms existing state-of-the-art methods in maintaining scene memory during long video generation [15][17] - The model demonstrates superior performance in static scene memory retention over time and exhibits good generalization across different scenes [6][15] Broader Research Context - The research team has accumulated multiple studies in the realm of world models and interactive video generation, proposing a framework that outlines five foundational capabilities: Generation, Control, Memory, Dynamics, and Intelligence [18] - This framework serves as a guiding direction for future research in foundational world models, with Context as Memory being a focused contribution on memory capabilities [18]