DNN controlled adaptive front-end for replay attack detection systems
- Conventional methods fall short in detecting replay spoofing attacks effectively.
- Auditory-based dynamic filters can detect artefacts in high-quality replayed signals.
- Deep neural networks can adaptively learn filter traits based ...
Developing robust countermeasures to protect automatic speaker verification systems against replay spoofing attacks is a well-recognized challenge. Current approaches to spoofing detection are generally based on a fixed front-end, typically a ...
Acoustic properties of non-native clear speech: Korean speakers of English
- Non-native clear speech is acoustically distinct from casual speech.
- The nature of modifications is the same in native and non-native clear speech.
- The magnitude of modifications is different in native and non-native clear speech. ...
The present study examined the acoustic properties of clear speech produced by non-native speakers of English (L1 Korean), in comparison to native clear speech. L1 Korean speakers of English (N=30) and native speakers of English (N=20) read an ...
Speech emotion recognition approaches: A systematic review
The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by ...
Highlights
- The speech-emotion recognition (SER) field became crucial in advanced Human-computer interaction (HCI).
- Numerous SER systems have been proposed by researchers using Machine Learning (ML) and Deep Learning (DL).
- This survey aims to ...
Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation
- Eder Pereira Neves,
- Marco Aparecido Queiroz Duarte,
- Jozue Vieira Filho,
- Caio Cesar Enside de Abreu,
- Bruno Rodrigues de Oliveira
This paper presents a new method to evaluate the quality of speech signals through images generated from a psychoacoustic model to estimate PESQ (ITU-T P862) values using a first-order Fuzzy Sugeno approach implemented in the Adaptive Neuro-Fuzzy ...
Highlights
- Extraction of speech signal factors using image processing techniques.
- Signal image extracted from a psychoacoustic model.
- Non-intrusive measurement based on PESQ values trained by ANFIS.
- Configuration of ANFIS with fuzzy c-...
Determining spectral stability in vowels: A comparison and assessment of different metrics
- Different metrics for spectral stability identification in vowels are discussed.
- A new metric is introduced.
- The different metrics are assessed both on synthesized and natural speech.
- Higher-dimensional metrics capture spectral ...
This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the ...
Graphical abstractDisplay Omitted
Toward enriched decoding of mandarin spontaneous speech
- Enriched decoding of spontaneous speech achieves better recognition performance.
- Part-of-speech features help to reduce the perplexity of language model.
- Hierarchical prosodic model enriches the recognition output with break type ...
A deep neural network (DNN)-based automatic speech recognition (ASR) method for enriched decoding of Mandarin spontaneous speech is proposed. It adopts an enhanced approach over the baseline model built with factored time delay neural networks (...