research-article

Emotional speaker identification using PCAFCM-deepforest with fuzzy logic

Authors:

Ali Bou Nassif,

Nawel NemmourAuthors Info & Claims

Neural Computing and Applications, Volume 36, Issue 30

Pages 18567 - 18581

https://doi.org/10.1007/s00521-024-10154-w

Published: 30 July 2024 Publication History

Abstract

Voice is perceived as a form of biometrics which communicates valuable and rich information pertinent to an individual, such as his or her identity, gender, accent, age and emotion. Speaker identification denotes the task of identifying speakers based on their intrinsic voice characteristics. This study proposes a text-independent speaker identification system based on principal component analysis (PCA), fuzzy C-means (FCM) along with deepForest called PCAFCM-deepForest. The proposed approach is evaluated under neutral and adverse talking environments. Given this approach, we assessed our proposed model architecture on five benchmark corpora, namely private Arabic Emirati-accented speech dataset, public English dataset; Crowd-sourced emotional multimodal actors dataset (CREMA), public German database; Berlin database of emotional speech (EmoDB), public Chinese and English; emotional speech database (ESD), and public French dataset; public Canadian French emotional (CaFE) speech dataset. Our analysis shows that the performance of speaker identification has been immensely increased (greatly improved) when fuzzy logic and PCA are both applied to the extracted mel-frequency cepstral coefficients (MFCC). Speaker identification performance achieved by the proposed PCAFCM-deepForest is superior to that obtained by deepForest alone, FCM-deepForest as well as convolutional neural network (CNN). Besides, it surpasses the following conventional models: Random forest and support vector machine (SVM). Our findings demonstrate that the attained average speaker identification accuracy is equivalent to 98.20% using the Emirati database; an average performance which outperforms the existing frameworks. Moreover, PCAFCM-deepForest is fine-tuned using the grid search algorithm, and the achieved complexity is much less than that of CNN.

References

[1]

Dhakal P, Damacharla P, Javaid A, and Devabhaktuni V A near real-time automatic speaker recognition architecture for voice-based user interface Mach Learn Knowl Extr 2019 1 1 504-520

[2]

Jahangir R, Teh YW, Nweke HF, Mujtaba G, Al-Garadi MA, and Ali I Speaker identification through artificial intelligence techniques: a comprehensive review and research challenges Expert Syst Appl 2021 171 114591

Digital Library

[3]

Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, and Polat K Emotional speaker identification using a novel capsule nets model Expert Syst Appl 2022 193 116469

Digital Library

[4]

F. Roumiassa and F.-Z. Chelali, (2020) “Speaker identification and verification system for Arabic and Berber Language,” In: 2020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP), pp. 242–247,

[5]

V. R. Apsingekar and P. L. De Leon, (2009) “Support vector machine based speaker identification systems using GMM parameters,” In: 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers pp. 1766–1769,

[6]

O. Elnaggar and R. Arelhi, (2019) “A new unsupervised short-utterance based speaker identification approach with parametric t-SNE dimensionality reduction,” In: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 92–101,

[7]

Zhou Z-H and Feng J Deep forest Natl Sci Rev 2019 6 1 74-86

[8]

Z.-H. Zhou, and J. Feng, (2017) “Deep forest: towards an alternative to deep neural networks,” In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence pp. 3553–3559,

[9]

Y.-L. Zhang et al., (2018) “Distributed deep forest and its application to automatic detection of cash-out fraud

[10]

Sun L et al. Adaptive feature selection guided deep forest for COVID-19 classification with chest CT IEEE J Biomed Heal Informatics 2020 24 10 2798-2805

[11]

Wang W, Guan X, Khan MT, Xiong Y, and Wei D-Q LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions Comput Biol Chem 2020 89 107406

[12]

Zhou M, Zeng X, and Chen A Deep forest hashing for image retrieval Pattern Recognit 2019 95 114-127

Digital Library

[13]

Cheng J et al. Emotion recognition from multi-channel EEG via deep forest IEEE J Biomed Heal Informatics 2021 25 2 453-464

[14]

Shahin I, Nassif AB, and Bahutair M Emirati-accented speaker identification in each of neutral and shouted talking environments Int J Speech Technol 2018 21 2 265-278

Digital Library

[15]

I. Shahin, (2018) “Text-independent emirati-accented speaker identification in emotional talking environment,” In: 2018 Fifth HCT Information Technology Trends (ITT), pp. 257–262,

[16]

Faragallah OS Robust noise MKMFCC–SVM automatic speaker identification Int J Speech Technol 2018 21 2 185-192

Digital Library

[17]

Karthikeyan V and Suja Priyadharsini S Adaptive boosted random forest-support vector machine based classification scheme for speaker identification Appl Soft Comput 2022 131 109826

Digital Library

[18]

N. Chauhan, T. Isshiki, and D. Li, (2019) “Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database,” In 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 130–133,

[19]

Nassif AB, Shahin I, Hamsa S, Nemmour N, and Hirose K CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions Appl Soft Comput 2021 103 107141

Digital Library

[20]

Shahin I, Alomari OA, Nassif AB, Afyouni I, Hashem IA, and Elnagar A An efficient feature selection method for arabic and english speech emotion recognition using grey wolf optimizer Appl Acoust 2023 205 109279

[21]

M. Bader, I. Shahin, A. Ahmed, and N. Werghi, (2022) “Hybrid CNN-LSTM speaker identification framework for evaluating the impact of face masks,” In: 2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 118–121,

[22]

Nassif AB et al. A novel RBFNN-CNN model for speaker identification in stressful talking environments Appl Sci 2022 12 10 4841

[23]

Hamsa S, Shahin I, Iraqi Y, Damiani E, Nassif AB, and Werghi N Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG Expert Syst Appl 2023 224 119871

Digital Library

[24]

Manikandan K and Chandra E Speaker identification analysis for SGMM with k-means and fuzzy C-means clustering using SVM statistical technique Int J Knowl Based Intell Eng Syst 2021 25 3 309-314

Digital Library

[25]

Shome N, Saritha B, Kashyap R, and Laskar RH A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions Neural Comput Appl 2023 35 26 18933-18947

Digital Library

[26]

Saritha B, Laskar MA, Kirupakaran AM, Laskar RH, Choudhury M, and Shome N “Deep learning-based end-to-end speaker identification using time-frequency representation of speech signal”, circuits Syst Signal Process 2024 43 3 1839-1861

Digital Library

[27]

Gambhir P, Dev A, Bansal P, Sharma DK, and Gupta D Residual networks for text-independent speaker identification: unleashing the power of residual learning J Inf Secur Appl 2024 80 103665

Digital Library

[28]

V. Levashenko, E. Zaitseva, and S. Puuronen, (2007) “Fuzzy classifier based on fuzzy decision tree,” In EUROCON 2007 - The International Conference on “Computer as a Tool,” pp. 823–827,

[29]

M. S. Ivanova, (2019) “Fuzzy set theory and fuzzy logic for activities automation in engineering education,” In: 2019 IEEE XXVIII International Scientific Conference Electronics (ET), pp. 1–4,

[30]

Geiger BC and Kubin G Information Loss in Deterministic Signal Processing Systems 2018 Cham Springer International Publishing

[31]

Świetlicka I, Kuniszyk-Jóźkowiak W, and Świetlicki M Artificial neural networks combined with the principal component analysis for non-fluent speech recognition Sensors 2022 22 1 321

[32]

Abolhassani AH, Selouani S-A, and O’Shaughnessy D “Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition”, in IEEE Workshop on Autom Speech Recognit Underst (ASRU) 2007 2007 19-23

[33]

Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, and Verma R CREMA-D: Crowd-sourced emotional multimodal actors dataset IEEE Trans Affect Comput 2014 5 4 377-390

[34]

Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, and Weiss B A database of German emotional speech Interspeech 2005 2005 1517-1520

[35]

Zhou K, Sisman B, Liu R, and Li H Emotional voice conversion: theory, databases and ESD Speech Commun 2022 137 1-18

Digital Library

[36]

P. Gournay, O. Lahaie, and R. Lefebvre, (2018) “A canadian french emotional speech dataset,” In Proceedings of the 9th ACM Multimedia Systems Conference, pp. 399–402,

Digital Library

[37]

Mohd Hanifa R, Isa K, and Mohamad S A review on speaker recognition: technology and challenges Comput Electr Eng 2021 90 107005

[38]

Singh N, Parveen N, and Chandra P Feature extraction algorithms for speaker recognition system and fuzzy logic Int J Adv Sci Technol 2020 29 7 3068-3076

[39]

T. F. Zheng and L. Li, (2017) Robustness-Related Issues in Speaker Recognition. Singapore: Springer Singapore

[40]

Sarikaya R and Hansen JHL High resolution speech feature parametrization for monophone-based stressed speech recognition IEEE Signal Process Lett 2000 7 7 182-185

[41]

I. T. Jolliffe, (1986) Principal Component Analysis. New York, NY: Springer New York

[42]

Tipping ME and Bishop CM Mixtures of probabilistic principal component analyzers Neural Comput 1999 11 2 443-482

Digital Library

[43]

Kurita T “Principal Component Analysis (PCA)”, in Computer Vision: A Reference Guide 2019 Cham Springer International Publishing 1-4

[44]

J. Nayak, B. Naik, and H. S. Behera, (2015) “Fuzzy C-means (FCM) clustering algorithm: a decade Review From 2000 to 2014,” In: Computational Intelligence in Data Mining - Volume 2, pp. 133–149

[45]

Gao Y, Wang Z, Xie J, and Pan J A new robust fuzzy c-means clustering method based on adaptive elastic distance Knowl Based Syst 2022 237 107769

Digital Library

[46]

Dunn JC A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters J Cybern 1973 3 3 32-57

[47]

Bezdek JC, Ehrlich R, and Full W FCM: The fuzzy c-means clustering algorithm Comput Geosci 1984 10 2 191-203

[48]

Omran MGH, Engelbrecht AP, and Salman A An overview of clustering methods Intell Data Anal 2007 11 6 583-605

[49]

Z.-H. Zhou, (2012) Ensemble Methods. Chapman and Hall/CRC,

[50]

Breiman L Stacked regressions Mach Learn 2004 24 49-64

[51]

Wolpert DH Stacked generalization Neural Netw 1992 5 2 241-259

Digital Library

[52]

Linfei Y, Zhao L, Tao YU, and Zhang X Deep forest reinforcement learning for preventive strategy considering automatic generation control in large-scale interconnected power systems Appl Sci 2018 8 2185

[53]

Khulaidah EZ and Irsalinda N FCM using squared euclidean distance for e-commerce classification in Indonesia J Phys Conf Ser 2020 1613 1 012071

[54]

J. Davis and M. Goadrich, (2006) “The relationship between precision-recall and ROC curves,” In: Proceedings of the 23rd international conference on Machine learning - ICML ’06 pp. 233–240,

Digital Library

[55]

McCrum-Gardner E Which is the correct statistical test to use? Br J Oral Maxillofac Surg 2008 46 1 38-41

[56]

Thabtah F, Hammoud S, Kamalov F, and Gonsalves A Data imbalance in classification: experimental evaluation Inf Sci (Ny) 2020 513 429-441

Digital Library

Index Terms

Emotional speaker identification using PCAFCM-deepforest with fuzzy logic

Index terms have been assigned to the content through auto-classification.

Recommendations

Emotional speaker identification using a novel capsule nets model
Abstract
Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult ...
Epoch extraction from emotional speech using single frequency filtering approach

An approach, which exploits the nature of impulse-like excitation in the speech signal is explored.Three properties of impulse are used to extract impulse-like discontinuities, namely: (a)An impulse in the time domain results in flat spectrum in the ...
Scores Selection for Emotional Speaker Recognition
ICB '09: Proceedings of the Third International Conference on Advances in Biometrics

Emotion variability of the training and testing utterances is one of the largest challenges in speaker recognition. It is a common situation where training data is the neutral speech and testing data is the mixture of neutral and emotional speech. In ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computing and Applications

Neural Computing and Applications Volume 36, Issue 30

Oct 2024

717 pages

EISSN:1433-3058

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 July 2024

Accepted: 27 June 2024

Received: 31 December 2023

Author Tags

Qualifiers

Research-article

Funding Sources

University of Sharjah

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents