Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Emotional speaker identification using PCAFCM-deepforest with fuzzy logic

Published: 30 July 2024 Publication History

Abstract

Voice is perceived as a form of biometrics which communicates valuable and rich information pertinent to an individual, such as his or her identity, gender, accent, age and emotion. Speaker identification denotes the task of identifying speakers based on their intrinsic voice characteristics. This study proposes a text-independent speaker identification system based on principal component analysis (PCA), fuzzy C-means (FCM) along with deepForest called PCAFCM-deepForest. The proposed approach is evaluated under neutral and adverse talking environments. Given this approach, we assessed our proposed model architecture on five benchmark corpora, namely private Arabic Emirati-accented speech dataset, public English dataset; Crowd-sourced emotional multimodal actors dataset (CREMA), public German database; Berlin database of emotional speech (EmoDB), public Chinese and English; emotional speech database (ESD), and public French dataset; public Canadian French emotional (CaFE) speech dataset. Our analysis shows that the performance of speaker identification has been immensely increased (greatly improved) when fuzzy logic and PCA are both applied to the extracted mel-frequency cepstral coefficients (MFCC). Speaker identification performance achieved by the proposed PCAFCM-deepForest is superior to that obtained by deepForest alone, FCM-deepForest as well as convolutional neural network (CNN). Besides, it surpasses the following conventional models: Random forest and support vector machine (SVM). Our findings demonstrate that the attained average speaker identification accuracy is equivalent to 98.20% using the Emirati database; an average performance which outperforms the existing frameworks. Moreover, PCAFCM-deepForest is fine-tuned using the grid search algorithm, and the achieved complexity is much less than that of CNN.

References

[1]
Dhakal P, Damacharla P, Javaid A, and Devabhaktuni V A near real-time automatic speaker recognition architecture for voice-based user interface Mach Learn Knowl Extr 2019 1 1 504-520
[2]
Jahangir R, Teh YW, Nweke HF, Mujtaba G, Al-Garadi MA, and Ali I Speaker identification through artificial intelligence techniques: a comprehensive review and research challenges Expert Syst Appl 2021 171 114591
[3]
Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, and Polat K Emotional speaker identification using a novel capsule nets model Expert Syst Appl 2022 193 116469
[4]
F. Roumiassa and F.-Z. Chelali, (2020) “Speaker identification and verification system for Arabic and Berber Language,” In: 2020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP), pp. 242–247,
[5]
V. R. Apsingekar and P. L. De Leon, (2009) “Support vector machine based speaker identification systems using GMM parameters,” In: 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers pp. 1766–1769,
[6]
O. Elnaggar and R. Arelhi, (2019) “A new unsupervised short-utterance based speaker identification approach with parametric t-SNE dimensionality reduction,” In: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 92–101,
[7]
Zhou Z-H and Feng J Deep forest Natl Sci Rev 2019 6 1 74-86
[8]
Z.-H. Zhou, and J. Feng, (2017) “Deep forest: towards an alternative to deep neural networks,” In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence pp. 3553–3559,
[9]
Y.-L. Zhang et al., (2018) “Distributed deep forest and its application to automatic detection of cash-out fraud
[10]
Sun L et al. Adaptive feature selection guided deep forest for COVID-19 classification with chest CT IEEE J Biomed Heal Informatics 2020 24 10 2798-2805
[11]
Wang W, Guan X, Khan MT, Xiong Y, and Wei D-Q LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions Comput Biol Chem 2020 89 107406
[12]
Zhou M, Zeng X, and Chen A Deep forest hashing for image retrieval Pattern Recognit 2019 95 114-127
[13]
Cheng J et al. Emotion recognition from multi-channel EEG via deep forest IEEE J Biomed Heal Informatics 2021 25 2 453-464
[14]
Shahin I, Nassif AB, and Bahutair M Emirati-accented speaker identification in each of neutral and shouted talking environments Int J Speech Technol 2018 21 2 265-278
[15]
I. Shahin, (2018) “Text-independent emirati-accented speaker identification in emotional talking environment,” In: 2018 Fifth HCT Information Technology Trends (ITT), pp. 257–262,
[16]
Faragallah OS Robust noise MKMFCC–SVM automatic speaker identification Int J Speech Technol 2018 21 2 185-192
[17]
Karthikeyan V and Suja Priyadharsini S Adaptive boosted random forest-support vector machine based classification scheme for speaker identification Appl Soft Comput 2022 131 109826
[18]
N. Chauhan, T. Isshiki, and D. Li, (2019) “Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database,” In 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 130–133,
[19]
Nassif AB, Shahin I, Hamsa S, Nemmour N, and Hirose K CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions Appl Soft Comput 2021 103 107141
[20]
Shahin I, Alomari OA, Nassif AB, Afyouni I, Hashem IA, and Elnagar A An efficient feature selection method for arabic and english speech emotion recognition using grey wolf optimizer Appl Acoust 2023 205 109279
[21]
M. Bader, I. Shahin, A. Ahmed, and N. Werghi, (2022) “Hybrid CNN-LSTM speaker identification framework for evaluating the impact of face masks,” In: 2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 118–121,
[22]
Nassif AB et al. A novel RBFNN-CNN model for speaker identification in stressful talking environments Appl Sci 2022 12 10 4841
[23]
Hamsa S, Shahin I, Iraqi Y, Damiani E, Nassif AB, and Werghi N Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG Expert Syst Appl 2023 224 119871
[24]
Manikandan K and Chandra E Speaker identification analysis for SGMM with k-means and fuzzy C-means clustering using SVM statistical technique Int J Knowl Based Intell Eng Syst 2021 25 3 309-314
[25]
Shome N, Saritha B, Kashyap R, and Laskar RH A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions Neural Comput Appl 2023 35 26 18933-18947
[26]
Saritha B, Laskar MA, Kirupakaran AM, Laskar RH, Choudhury M, and Shome N “Deep learning-based end-to-end speaker identification using time-frequency representation of speech signal”, circuits Syst Signal Process 2024 43 3 1839-1861
[27]
Gambhir P, Dev A, Bansal P, Sharma DK, and Gupta D Residual networks for text-independent speaker identification: unleashing the power of residual learning J Inf Secur Appl 2024 80 103665
[28]
V. Levashenko, E. Zaitseva, and S. Puuronen, (2007) “Fuzzy classifier based on fuzzy decision tree,” In EUROCON 2007 - The International Conference on “Computer as a Tool,” pp. 823–827,
[29]
M. S. Ivanova, (2019) “Fuzzy set theory and fuzzy logic for activities automation in engineering education,” In: 2019 IEEE XXVIII International Scientific Conference Electronics (ET), pp. 1–4,
[30]
Geiger BC and Kubin G Information Loss in Deterministic Signal Processing Systems 2018 Cham Springer International Publishing
[31]
Świetlicka I, Kuniszyk-Jóźkowiak W, and Świetlicki M Artificial neural networks combined with the principal component analysis for non-fluent speech recognition Sensors 2022 22 1 321
[32]
Abolhassani AH, Selouani S-A, and O’Shaughnessy D “Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition”, in IEEE Workshop on Autom Speech Recognit Underst (ASRU) 2007 2007 19-23
[33]
Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, and Verma R CREMA-D: Crowd-sourced emotional multimodal actors dataset IEEE Trans Affect Comput 2014 5 4 377-390
[34]
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, and Weiss B A database of German emotional speech Interspeech 2005 2005 1517-1520
[35]
Zhou K, Sisman B, Liu R, and Li H Emotional voice conversion: theory, databases and ESD Speech Commun 2022 137 1-18
[36]
P. Gournay, O. Lahaie, and R. Lefebvre, (2018) “A canadian french emotional speech dataset,” In Proceedings of the 9th ACM Multimedia Systems Conference, pp. 399–402,
[37]
Mohd Hanifa R, Isa K, and Mohamad S A review on speaker recognition: technology and challenges Comput Electr Eng 2021 90 107005
[38]
Singh N, Parveen N, and Chandra P Feature extraction algorithms for speaker recognition system and fuzzy logic Int J Adv Sci Technol 2020 29 7 3068-3076
[39]
T. F. Zheng and L. Li, (2017) Robustness-Related Issues in Speaker Recognition. Singapore: Springer Singapore
[40]
Sarikaya R and Hansen JHL High resolution speech feature parametrization for monophone-based stressed speech recognition IEEE Signal Process Lett 2000 7 7 182-185
[41]
I. T. Jolliffe, (1986) Principal Component Analysis. New York, NY: Springer New York
[42]
Tipping ME and Bishop CM Mixtures of probabilistic principal component analyzers Neural Comput 1999 11 2 443-482
[43]
Kurita T “Principal Component Analysis (PCA)”, in Computer Vision: A Reference Guide 2019 Cham Springer International Publishing 1-4
[44]
J. Nayak, B. Naik, and H. S. Behera, (2015) “Fuzzy C-means (FCM) clustering algorithm: a decade Review From 2000 to 2014,” In: Computational Intelligence in Data Mining - Volume 2, pp. 133–149
[45]
Gao Y, Wang Z, Xie J, and Pan J A new robust fuzzy c-means clustering method based on adaptive elastic distance Knowl Based Syst 2022 237 107769
[46]
Dunn JC A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters J Cybern 1973 3 3 32-57
[47]
Bezdek JC, Ehrlich R, and Full W FCM: The fuzzy c-means clustering algorithm Comput Geosci 1984 10 2 191-203
[48]
Omran MGH, Engelbrecht AP, and Salman A An overview of clustering methods Intell Data Anal 2007 11 6 583-605
[49]
Z.-H. Zhou, (2012) Ensemble Methods. Chapman and Hall/CRC,
[50]
Breiman L Stacked regressions Mach Learn 2004 24 49-64
[51]
Wolpert DH Stacked generalization Neural Netw 1992 5 2 241-259
[52]
Linfei Y, Zhao L, Tao YU, and Zhang X Deep forest reinforcement learning for preventive strategy considering automatic generation control in large-scale interconnected power systems Appl Sci 2018 8 2185
[53]
Khulaidah EZ and Irsalinda N FCM using squared euclidean distance for e-commerce classification in Indonesia J Phys Conf Ser 2020 1613 1 012071
[54]
J. Davis and M. Goadrich, (2006) “The relationship between precision-recall and ROC curves,” In: Proceedings of the 23rd international conference on Machine learning - ICML ’06 pp. 233–240,
[55]
McCrum-Gardner E Which is the correct statistical test to use? Br J Oral Maxillofac Surg 2008 46 1 38-41
[56]
Thabtah F, Hammoud S, Kamalov F, and Gonsalves A Data imbalance in classification: experimental evaluation Inf Sci (Ny) 2020 513 429-441

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computing and Applications
Neural Computing and Applications  Volume 36, Issue 30
Oct 2024
717 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 July 2024
Accepted: 27 June 2024
Received: 31 December 2023

Author Tags

  1. DeepForest
  2. Emotional speech
  3. Fuzzy C-means
  4. Principle component analysis
  5. Speaker identification
  6. Emotional speech

Qualifiers

  • Research-article

Funding Sources

  • University of Sharjah

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media