Nothing Special   »   [go: up one dir, main page]

Feng et al., 2023 - Google Patents

Self-supervised video forensics by audio-visual anomaly detection

Feng et al., 2023

View PDF
Document ID
4355110763763402769
Author
Feng C
Chen Z
Owens A
Publication year
Publication venue
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

External Links

Snippet

Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We …
Continue reading at openaccess.thecvf.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • G06F17/30799Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier

Similar Documents

Publication Publication Date Title
Feng et al. Self-supervised video forensics by audio-visual anomaly detection
Wu et al. Exploring heterogeneous clues for weakly-supervised audio-visual video parsing
Alwassel et al. Tsp: Temporally-sensitive pretraining of video encoders for localization tasks
Roth et al. Ava active speaker: An audio-visual dataset for active speaker detection
Khalid et al. FakeAVCeleb: A novel audio-video multimodal deepfake dataset
Yang et al. LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild
Haliassos et al. Leveraging real talking faces via self-supervision for robust forgery detection
Morgado et al. Learning representations from audio-visual spatial alignment
Chung et al. Lip reading in the wild
Chung et al. Learning to lip read words by watching videos
Chung et al. Out of time: automated lip sync in the wild
Serrano Gracia et al. Fast fight detection
Shang et al. A multimodal misinformation detector for covid-19 short videos on tiktok
Zeng et al. Contrastive learning of global and local video representations
Kamoona et al. Multiple instance-based video anomaly detection using deep temporal encoding–decoding
Cai et al. Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization
CN108307229B (en) Video and audio data processing method and device
Chen et al. Audio-visual synchronisation in the wild
Motiian et al. Online human interaction detection and recognition with multiple cameras
Ellis et al. Why we watch the news: a dataset for exploring sentiment in broadcast video news
Le et al. Learning multimodal temporal representation for dubbing detection in broadcast media
Korshunov et al. Tampered speaker inconsistency detection with phonetically aware audio-visual features
Ul Haq et al. An effective video summarization framework based on the object of interest using deep learning
Wu et al. Binaural audio-visual localization
Mo et al. A unified audio-visual learning framework for localization, separation, and recognition