SPCO: Vol 154, No C

Volume 154, Issue COct 2023

Volume 154, Issue C

Oct 2023

Publisher:

Elsevier Science Publishers B. V.
PO Box 211 1000 AE Amsterdam
Netherlands

ISSN:0167-6393

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

editorial

Editorial Board

https://doi.org/10.1016/S0167-6393(23)00129-2

research-article

DNN controlled adaptive front-end for replay attack detection systems

https://doi.org/10.1016/j.specom.2023.102973

Highlights

Conventional methods fall short in detecting replay spoofing attacks effectively.
Auditory-based dynamic filters can detect artefacts in high-quality replayed signals.
Deep neural networks can adaptively learn filter traits based ...

Abstract

Developing robust countermeasures to protect automatic speaker verification systems against replay spoofing attacks is a well-recognized challenge. Current approaches to spoofing detection are generally based on a fixed front-end, typically a ...

research-article

Acoustic properties of non-native clear speech: Korean speakers of English

https://doi.org/10.1016/j.specom.2023.102982

Highlights

Non-native clear speech is acoustically distinct from casual speech.
The nature of modifications is the same in native and non-native clear speech.
The magnitude of modifications is different in native and non-native clear speech. ...

Abstract

The present study examined the acoustic properties of clear speech produced by non-native speakers of English (L1 Korean), in comparison to native clear speech. L1 Korean speakers of English (N=30) and native speakers of English (N=20) read an ...

review-article

Speech emotion recognition approaches: A systematic review

https://doi.org/10.1016/j.specom.2023.102974

Abstract

The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human–Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by ...

Highlights

The speech-emotion recognition (SER) field became crucial in advanced Human-computer interaction (HCI).
Numerous SER systems have been proposed by researchers using Machine Learning (ML) and Deep Learning (DL).
This survey aims to ...

research-article

Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation

https://doi.org/10.1016/j.specom.2023.102972

Abstract

This paper presents a new method to evaluate the quality of speech signals through images generated from a psychoacoustic model to estimate PESQ (ITU-T P862) values using a first-order Fuzzy Sugeno approach implemented in the Adaptive Neuro-Fuzzy ...

Highlights

Extraction of speech signal factors using image processing techniques.
Signal image extracted from a psychoacoustic model.
Non-intrusive measurement based on PESQ values trained by ANFIS.
Configuration of ANFIS with fuzzy c-...

research-article

Determining spectral stability in vowels: A comparison and assessment of different metrics

https://doi.org/10.1016/j.specom.2023.102984

Highlights

Different metrics for spectral stability identification in vowels are discussed.
A new metric is introduced.
The different metrics are assessed both on synthesized and natural speech.
Higher-dimensional metrics capture spectral ...

Abstract

This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the ...

Graphical abstract

Display Omitted

research-article

Toward enriched decoding of mandarin spontaneous speech

https://doi.org/10.1016/j.specom.2023.102983

Highlights

Enriched decoding of spontaneous speech achieves better recognition performance.
Part-of-speech features help to reduce the perplexity of language model.
Hierarchical prosodic model enriches the recognition output with break type ...

Abstract

A deep neural network (DNN)-based automatic speech recognition (ASR) method for enriched decoding of Mandarin spontaneous speech is proposed. It adopts an enhanced approach over the baseline model built with factored time delay neural networks (...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Speech Communication

Sections

Editorial Board

DNN controlled adaptive front-end for replay attack detection systems

Acoustic properties of non-native clear speech: Korean speakers of English

Speech emotion recognition approaches: A systematic review

Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation

Determining spectral stability in vowels: A comparison and assessment of different metrics

Toward enriched decoding of mandarin spontaneous speech