Feng et al., 2023 - Google Patents

Self-supervised video forensics by audio-visual anomaly detection

Feng et al., 2023

Document ID: 4355110763763402769
Author: Feng C; Chen Z; Owens A
Publication year: 2023
Publication venue: proceedings of the IEEE/CVF conference on computer vision and pattern recognition

External Links

Cited by

Snippet

Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We …

Continue reading at openaccess.thecvf.com (PDF) (other versions)

238000001514 detection method 0 title abstract description 44

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier

Similar Documents

Publication	Publication Date	Title
Feng et al.	2023	Self-supervised video forensics by audio-visual anomaly detection
Wu et al.	2021	Exploring heterogeneous clues for weakly-supervised audio-visual video parsing
Chen et al.	2021	Localizing visual sounds the hard way
Alwassel et al.	2021	Tsp: Temporally-sensitive pretraining of video encoders for localization tasks
Roth et al.	2020	Ava active speaker: An audio-visual dataset for active speaker detection
Khalid et al.	2021	FakeAVCeleb: A novel audio-video multimodal deepfake dataset
Haliassos et al.	2022	Leveraging real talking faces via self-supervision for robust forgery detection
Yang et al.	2019	LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild
Morgado et al.	2020	Learning representations from audio-visual spatial alignment
Mo et al.	2022	Localizing visual sounds the easy way
Chung et al.	2017	Out of time: automated lip sync in the wild
Chung et al.	2017	Lip reading in the wild
Zeng et al.	2021	Contrastive learning of global and local video representations
Serrano Gracia et al.	2015	Fast fight detection
Cai et al.	2022	Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization
CN108307229B (en)	2023-12-22	Video and audio data processing method and device
Xu et al.	2022	Ava-avd: Audio-visual speaker diarization in the wild
Chen et al.	2021	Audio-visual synchronisation in the wild
Korshunov et al.	2019	Tampered speaker inconsistency detection with phonetically aware audio-visual features
Ellis et al.	2014	Why we watch the news: a dataset for exploring sentiment in broadcast video news
Wu et al.	2021	Binaural audio-visual localization
Mo et al.	2023	A unified audio-visual learning framework for localization, separation, and recognition
Hao et al.	2022	Deepfake detection using multiple data modalities
Zhang et al.	2023	Ummaformer: A universal multimodal-adaptive transformer framework for temporal forgery localization
Tapu et al.	2019	DEEP-HEAR: A multimodal subtitle positioning system dedicated to deaf and hearing-impaired people