Feng et al., 2023 - Google Patents
Self-supervised video forensics by audio-visual anomaly detectionFeng et al., 2023
View PDF- Document ID
- 4355110763763402769
- Author
- Feng C
- Chen Z
- Owens A
- Publication year
- Publication venue
- Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
External Links
Snippet
Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We …
- 238000001514 detection method 0 title abstract description 44
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Feng et al. | Self-supervised video forensics by audio-visual anomaly detection | |
Wu et al. | Exploring heterogeneous clues for weakly-supervised audio-visual video parsing | |
Alwassel et al. | Tsp: Temporally-sensitive pretraining of video encoders for localization tasks | |
Roth et al. | Ava active speaker: An audio-visual dataset for active speaker detection | |
Khalid et al. | FakeAVCeleb: A novel audio-video multimodal deepfake dataset | |
Yang et al. | LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild | |
Haliassos et al. | Leveraging real talking faces via self-supervision for robust forgery detection | |
Morgado et al. | Learning representations from audio-visual spatial alignment | |
Chung et al. | Lip reading in the wild | |
Chung et al. | Learning to lip read words by watching videos | |
Chung et al. | Out of time: automated lip sync in the wild | |
Serrano Gracia et al. | Fast fight detection | |
Shang et al. | A multimodal misinformation detector for covid-19 short videos on tiktok | |
Zeng et al. | Contrastive learning of global and local video representations | |
Kamoona et al. | Multiple instance-based video anomaly detection using deep temporal encoding–decoding | |
Cai et al. | Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization | |
CN108307229B (en) | Video and audio data processing method and device | |
Chen et al. | Audio-visual synchronisation in the wild | |
Motiian et al. | Online human interaction detection and recognition with multiple cameras | |
Ellis et al. | Why we watch the news: a dataset for exploring sentiment in broadcast video news | |
Le et al. | Learning multimodal temporal representation for dubbing detection in broadcast media | |
Korshunov et al. | Tampered speaker inconsistency detection with phonetically aware audio-visual features | |
Ul Haq et al. | An effective video summarization framework based on the object of interest using deep learning | |
Wu et al. | Binaural audio-visual localization | |
Mo et al. | A unified audio-visual learning framework for localization, separation, and recognition |