Dinkel et al., 2018 - Google Patents

Investigating raw wave deep neural networks for end-to-end speaker spoofing detection

Dinkel et al., 2018

Document ID: 6253518142894034012
Author: Dinkel H; Qian Y; Yu K
Publication year: 2018
Publication venue: IEEE/ACM Transactions on Audio, Speech, and Language Processing

External Links

Cited by

Snippet

Recent advances in automatic speaker verification (ASV) lead to an increased interest in securing these systems for real-world applications. Malicious spoofing attempts against ASV systems can lead to serious security breaches. A spoofing attack within the context of ASV is …

Continue reading at ieeexplore.ieee.org (other versions)

230000001537 neural 0 title abstract description 60

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique using neural networks

Similar Documents

Publication	Publication Date	Title
Dinkel et al.	2018	Investigating raw wave deep neural networks for end-to-end speaker spoofing detection
Kabir et al.	2021	A survey of speaker recognition: Fundamental theories, recognition methods and opportunities
Hanifa et al.	2021	A review on speaker recognition: Technology and challenges
Kamble et al.	2020	Advances in anti-spoofing: from the perspective of ASVspoof challenges
Wu et al.	2015	Spoofing and countermeasures for speaker verification: A survey
Aljasem et al.	2021	Secure automatic speaker verification (SASV) system through sm-ALTP features and asymmetric bagging
Yoon et al.	2020	A new replay attack against automatic speaker verification systems
Biagetti et al.	2016	An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames
Joshi et al.	2013	A Study of speech emotion recognition methods
Agrawal et al.	2010	Prosodic feature based text dependent speaker recognition using machine learning algorithms
Mamyrbayev et al.	2021	Development of security systems using DNN and i & x-vector classifiers
Zhu et al.	2022	Source tracing: detecting voice spoofing
Babu Rao et al.	2024	Automatic Speech Recognition Design Modeling
Neelima et al.	2020	Mimicry voice detection using convolutional neural networks
Reimao	2019	Synthetic speech detection using deep neural networks
Raghib et al.	2017	Emotion analysis and speech signal processing
Shitov et al.	2020	Learning acoustic word embeddings with dynamic time warping triplet networks
Rupesh Kumar et al.	2022	Generative and discriminative modelling of linear energy sub-bands for spoof detection in speaker verification systems
Gao	2022	Audio deepfake detection based on differences in human and machine generated speech
Praksah et al.	2015	Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier
Dennis et al.	2015	Generalized Hough transform for speech pattern classification
Trabelsi et al.	2016	Comparison of several acoustic modeling techniques for speech emotion recognition
Khonglah et al.	2014	Exploration of deep belief networks for vowel-like regions detection
Manzo-Martínez et al.	2024	Analysis of the Impact Using Pre-emphasis Filter, Unvoiced Sounds, Frame Size and Feature Vector Size on Human Emotion Recognition by Voice and Machine Learning
Dulhare et al.	2022	A Novel Approach for Speech Emotion Recognition with Facial Expression Analysis