Workflow
AI生成检测
icon
Search documents
NeurIPS 2025 Spotlight | 你刷到的视频是真的么?用物理规律拆穿Sora谎言
机器之心· 2025-11-05 06:30
Core Viewpoint - The article discusses the development of a physics-driven spatiotemporal modeling framework for detecting AI-generated videos, emphasizing the need for a robust detection method that leverages physical consistency rather than superficial features [6][47]. Group 1: Research Background - The rise of generative AI technologies has led to significant advancements in video synthesis, but the detection of such videos faces new challenges due to the complex spatial and temporal dependencies inherent in video data [7]. - Existing detection methods often focus on superficial inconsistencies, which are less effective against high-quality generated videos that obscure these features [7][8]. - The core dilemma in AI video detection is how to construct a detection framework that is robust to unknown generative models by understanding the physical evolution laws of natural videos [8]. Group 2: Proposed Methodology - The article introduces the concept of Normalized Spatiotemporal Gradient (NSG) statistics, which quantifies the physical inconsistencies in generated videos by analyzing the differences in NSG distributions between real and generated videos [3][18]. - The NSG-VD method is proposed as a universal video detection approach that models the distribution of natural videos without relying on specific generative models, demonstrating strong detection performance across various scenarios [3][28]. Group 3: Experimental Validation - The NSG-VD framework was evaluated on the GenVideo benchmark, which includes 10 different generative models, showing superior performance compared to existing baseline methods [40]. - In mixed data training on Kinetics-400 (real videos) and Pika (generated videos), NSG-VD achieved an average recall of 88.02% and an F1 score of 90.87%, significantly outperforming the previous best method, DeMamba [40]. - Even with a limited training dataset of only 1,000 generated videos, NSG-VD maintained robust performance, achieving a recall of 82.14% on the Sora model, indicating high data efficiency [41]. Group 4: Theoretical Foundations - The theoretical framework of NSG-VD is grounded in the principles of probability flow conservation and continuity equations, which describe the transport of conserved quantities in physical systems [13][14]. - The NSG statistic captures the relationship between spatial probability gradients and temporal density changes, providing a unified measure of consistency across different video scenarios [20][28]. Group 5: Future Directions - The article suggests that future work will focus on refining the physical models used in NSG-VD, optimizing computational efficiency, and exploring the feasibility of real-time detection applications [48].