Workflow
Virtually Being
icon
Search documents
SIGGRAPH Asia 2025|当视频生成真正「看清一个人」:多视角身份一致、真实光照与可控镜头的统一框架
机器之心· 2025-12-27 04:01
Core Viewpoint - The article emphasizes that understanding a person's identity in video generation requires capturing their appearance under various angles and lighting conditions, presenting identity as a stable 4D representation rather than a static 2D attribute [9][36]. Group 1: Multi-View Identity Preservation - Recent advancements in video generation have focused on character customization, typically assuming that if a character looks like themselves from one angle, their identity is preserved. This assumption is flawed in real video and film contexts [4][5]. - Identity is strongly view-dependent, with facial features and body posture changing systematically with different angles. Single or few images cannot capture the full range of a person's appearance [5][6]. - The article argues that true identity preservation in videos with real 3D camera movements is fundamentally a multi-view consistency issue, not just a single-frame similarity problem [6][7]. Group 2: Methodology Overview - To address the long-ignored issue of multi-view identity, the article proposes a redesign of the character customization process from a data perspective [11]. - The approach involves multi-view performance capture rather than relying on single-view references, utilizing a 4D Gaussian Splatting (4DGS) method for data generation [12][14]. - A two-phase training strategy is employed: the first phase focuses on camera perception pre-training to understand how camera movements affect perspective changes, while the second phase fine-tunes the model for multi-view identity customization [18][19]. Group 3: Lighting Realism - Lighting is a critical dimension for recognizing identity in real films, as characters are seen under various lighting conditions. The article introduces HDR-based relighting data to enhance lighting realism in generated videos [21][23]. - Experimental results indicate that videos generated with relighting data are perceived as more natural and realistic, with 83.9% of users favoring the enhanced lighting conditions [23]. Group 4: Multi-Person Generation - In multi-person video generation, the importance of multi-view identity preservation is amplified. The model must maintain stable identity modeling for each character across different angles and lighting to ensure natural interactions [25][26]. - The article describes two methods for multi-person generation: capturing performances for 4DGS reconstruction and rendering videos with precise 3D camera parameters to ensure identity consistency [27]. Group 5: Experimental Conclusions - Systematic experiments show that models trained with multi-view data significantly outperform those using only frontal-view data in terms of identity consistency and realism [31][32]. - User studies confirm a clear preference for generated results that exhibit stable multi-view identities, highlighting the importance of this approach in video generation [32].