Dinkel et al., 2018 - Google Patents
Investigating raw wave deep neural networks for end-to-end speaker spoofing detectionDinkel et al., 2018
- Document ID
- 6253518142894034012
- Author
- Dinkel H
- Qian Y
- Yu K
- Publication year
- Publication venue
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
External Links
Snippet
Recent advances in automatic speaker verification (ASV) lead to an increased interest in securing these systems for real-world applications. Malicious spoofing attempts against ASV systems can lead to serious security breaches. A spoofing attack within the context of ASV is …
- 230000001537 neural 0 title abstract description 60
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique using neural networks
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dinkel et al. | Investigating raw wave deep neural networks for end-to-end speaker spoofing detection | |
Kabir et al. | A survey of speaker recognition: Fundamental theories, recognition methods and opportunities | |
Hanifa et al. | A review on speaker recognition: Technology and challenges | |
Kamble et al. | Advances in anti-spoofing: from the perspective of ASVspoof challenges | |
Wu et al. | Spoofing and countermeasures for speaker verification: A survey | |
Aljasem et al. | Secure automatic speaker verification (SASV) system through sm-ALTP features and asymmetric bagging | |
Yoon et al. | A new replay attack against automatic speaker verification systems | |
Biagetti et al. | An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames | |
Joshi et al. | A Study of speech emotion recognition methods | |
Agrawal et al. | Prosodic feature based text dependent speaker recognition using machine learning algorithms | |
Mamyrbayev et al. | Development of security systems using DNN and i & x-vector classifiers | |
Zhu et al. | Source tracing: detecting voice spoofing | |
Babu Rao et al. | Automatic Speech Recognition Design Modeling | |
Neelima et al. | Mimicry voice detection using convolutional neural networks | |
Reimao | Synthetic speech detection using deep neural networks | |
Raghib et al. | Emotion analysis and speech signal processing | |
Shitov et al. | Learning acoustic word embeddings with dynamic time warping triplet networks | |
Rupesh Kumar et al. | Generative and discriminative modelling of linear energy sub-bands for spoof detection in speaker verification systems | |
Gao | Audio deepfake detection based on differences in human and machine generated speech | |
Praksah et al. | Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier | |
Dennis et al. | Generalized Hough transform for speech pattern classification | |
Trabelsi et al. | Comparison of several acoustic modeling techniques for speech emotion recognition | |
Khonglah et al. | Exploration of deep belief networks for vowel-like regions detection | |
Manzo-Martínez et al. | Analysis of the Impact Using Pre-emphasis Filter, Unvoiced Sounds, Frame Size and Feature Vector Size on Human Emotion Recognition by Voice and Machine Learning | |
Dulhare et al. | A Novel Approach for Speech Emotion Recognition with Facial Expression Analysis |