Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments
<p>The Emirati dataset and its English version.</p> "> Figure 2
<p>Schematic configuration of the basic structure of a node in ANN [<a href="#B43-information-13-00456" class="html-bibr">43</a>].</p> "> Figure 3
<p>Histograms of scores for true and false speakers of neutral and emotional speech for the GMM model in the Emirati database.</p> "> Figure 4
<p>Architecture used for DNN models in speaker verification.</p> "> Figure 5
<p>Background DNN topology for learning speaker-specific features [<a href="#B33-information-13-00456" class="html-bibr">33</a>].</p> "> Figure 6
<p>ROC curves with each based on GMM, KNN, SVM, ANN, CNN, LSTM, and GRU models for the different emotional states using the Emirati database.</p> "> Figure 7
<p>DET curves with each based on GMM, KNN, SVM, ANN, CNN, LSTM, and GRU models for the different emotional states using the Emirati database.</p> "> Figure 8
<p>ROC plots using CREMA database based on CNN, LSTM, GRU, <span class="html-italic">i</span>vector, and GMM.</p> "> Figure 9
<p>DET curves with each based on CNN, LSTM, GRU, <span class="html-italic">i</span>vector, and GMM models for the different emotional states using the CREMA database.</p> "> Figure 10
<p>Plots of ROC curves for the RAVDESS database each based on the CNN, LSTM, GRU, and <span class="html-italic">i</span>vector.</p> "> Figure 11
<p>DET curves of RAVDESS database based on CNN, LSTM, GRU, <span class="html-italic">i</span>vector, and GMM.</p> ">
Abstract
:1. Introduction
- Development/Training: internal representations are learned from the corresponding speaker’s acoustic frames.
- Enrollment: voiceprints are derived from voice samples.
- Evaluation: verification is achieved by comparing the test utterance speaker representation against the speaker models [3].
2. Literature Review
2.1. Speaker Verification Using Classical Classifiers
2.2. Speaker Verification Using Deep Learning
2.3. Contribution
- Unlike previous studies, the d-vector approach implemented in this work uses CNN, as well as recurrent neural networks (LSTM and GRU) layers in order to extract speaker intrinsic voice characteristics from unbiased utterances rather than using CNNs and locally connected networks (LCNs) as in [33], or fully connected maxout layers as in [34], or LSTM layers only as in [35].
- Optimum values of CNN, LSTM, and GRU model hyperparameters are computed using the Grid Search (GS) tuning approach.
- In addition, all state-of-the-art studies examined the verification performance using the d-vector as well as the ivector method on neutrally uttered speech only. However, this paper focuses on neutral speech in addition to speech expressed as a function of emotions, namely, anger, sadness, happiness, disgust, and fear.
3. Datasets
3.1. Arabic Emirati Speech Dataset
3.2. Crowd-Sourced Emotional Multimodal Actors Dataset
3.3. Ryerson Audio–Visual Database of Emotional Speech and Song Dataset
3.4. Feature Extraction
4. Classical Classifiers
4.1. Gaussian Mixture Models
4.2. Support Vector Machines
4.3. K-Nearest Neighbors
4.4. Artificial Neural Networks
4.5. Model Configuration and Verification
4.5.1. The GMM Model
4.5.2. SVM, KNN and ANN Models
5. Deep Neural Networks
5.1. System Overview
5.2. CNN Model
5.2.1. Development Phase
5.2.2. Enrollment Phase
5.2.3. Evaluation Phase
5.3. LSTM Model
5.3.1. Development Phase
5.3.2. Enrollment Phase
5.3.3. Evaluation Phase
5.4. GRU Model
5.4.1. Development Phase
5.4.2. Enrollment Phase
5.4.3. Evaluation Phase
5.5. Enrollment Phase
5.6. Evaluation Phase
6. Decision Threshold and Verification Process
7. Results and Discussion
7.1. CREMA Database
7.2. RAVDESS Database
7.3. Comparison with Other Related Work
7.4. Computation Performance Study
8. Conclusions, Limitations, and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech Recognition Using Deep Neural Networks: A Systematic Review. IEEE Access 2019, 7, 19143–19165. [Google Scholar] [CrossRef]
- Reynolds, D.A. An Overview of Automatic Speaker Recognition Technology. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume IV, pp. 4072–4075. [Google Scholar]
- Salehghaffari, H. Speaker Verification using Convolutional Neural Networks. arXiv 2018, arXiv:abs/1803.0. [Google Scholar]
- Baldominos, A.; Cervantes, A.; Saez, Y.; Isasi, P. A Comparison of Machine Learning and Deep Learning Techniques for Activity Recognition using Mobile Devices. Sensors 2019, 19, 521. [Google Scholar] [CrossRef] [PubMed]
- Zappone, A.; Di Renzo, M.; Debbah, M. Wireless Networks Design in the Era of Deep Learning: Model-Based, AI-Based, or Both? IEEE Trans. Commun. 2019, 67, 7331–7376. [Google Scholar] [CrossRef]
- Wan, V.; Campbell, W.M. Support vector machines for speaker verification and identification. In Proceedings of the Neural Networks for Signal Processing X. In Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501), Sydney, NSW, Australia, 11–13 December 2000; Volume 2, pp. 775–784. [Google Scholar]
- Vivaracho-Pascual, C.; Ortega-Garcia, J.; Alonso, L.; Moro-Sancho, Q.I. A comparative study of MLP-based artificial neural networks in text-independent speaker verification against GMM-based systems. In Proceedings of the Eurospeech, Aalborg, Denmark, 3–7 September 2001; pp. 1753–1757. [Google Scholar]
- Campbell, W.M.; Sturim, D.E.; Reynolds, D.A. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 2006, 13, 308–311. [Google Scholar] [CrossRef]
- Chen, S.-H.; Luo, Y. Speaker Verification Using MFCC and Support Vector Machine. In Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong, China, 18–20 March 2009. [Google Scholar]
- Alarifi, A. Arabic text-dependent speaker verification for mobile devices using artificial neural networks. Int. J. Phys. Sci. 2012, 7, 1073–1082. [Google Scholar] [CrossRef]
- Mahmood, A.; Alsulaiman, M.; Muhammad, G. Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF). Arab. J. Sci. Eng. 2014, 39, 3799–3811. [Google Scholar] [CrossRef]
- Taylor, S.; Hanani, A.; Basha, H.; Sharaf, Y. Palestinian Arabic regional accent recognition. In Proceedings of the 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 14–17 October 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Chauhan, N.; Chandra, M. Speaker recognition and verification using artificial neural network. In Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 22–24 March 2017; pp. 1147–1149. [Google Scholar]
- Wu, W.; Zheng, T.F.; Xu, M.-X.; Bao, H.-J. Study on Speaker Verification on Emotional Speech. In Proceedings of the NTERSPEECH, Pittsburgh, PA, USA, 17–21 September 2006. [Google Scholar]
- Pillay, S.G.; Ariyaeeinia, A.; Pawlewski, M.; Sivakumaran, P. Speaker verification under mismatched data conditions. Signal Process. IET 2009, 3, 236–246. [Google Scholar] [CrossRef]
- Shahin, I.; Nassif, A.B. Three-stage speaker verification architecture in emotional talking environments. Int. J. Speech Technol. 2018, 21, 915–930. [Google Scholar] [CrossRef]
- Mittal, A.; Dua, M. Automatic speaker verification systems and spoof detection techniques: Review and analysis. Int. J. Speech Technol. 2022, 25, 105–134. [Google Scholar] [CrossRef]
- Ferrer, L.; McLaren, M.; Brümmer, N. A speaker verification backend with robust performance across conditions. Comput. Speech Lang. 2022, 71, 101258. [Google Scholar] [CrossRef]
- Liu, T.; Das, R.K.; Lee, K.A.; Li, H. Neural Acoustic-Phonetic Approach for Speaker Verification with Phonetic Attention Mask. IEEE Signal Process. Lett. 2022, 29, 782–786. [Google Scholar] [CrossRef]
- Bhattacharya, G.; Alam, J.; Kenny, P. Deep speaker embeddings for short-duration speaker verification. In Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, Stockholm, Sweden, 20–24 August 2017; Volume 2017, pp. 1517–1521. [Google Scholar] [CrossRef]
- Reynolds, D.A.; Quatieri, T.F.; Dunn, R.B. Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. A Rev. J. 2000, 10, 19–41. [Google Scholar] [CrossRef]
- Dehak, N.; Kenny, P.J.; Dehak, R.; Dumouchel, P.; Ouellet, P. Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 788–798. [Google Scholar] [CrossRef]
- Kenny, P.; Ouellet, P.; Dehak, N.; Gupta, V.; Dumouchel, P. A Study of Inter-Speaker Variability in Speaker Verification. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 980–988. [Google Scholar] [CrossRef]
- Garcia-Romero, D.; Espy-Wilson, C. Analysis of i-vector Length Normalization in Speaker Recognition Systems. In Proceedings of the Interspeech, Florence, Italy, 28–31 August 2011; pp. 249–252. [Google Scholar]
- Rupesh Kumar, S.; Bharathi, B. Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems. Circuits Syst. Signal Process. 2022, 41, 3811–3831. [Google Scholar] [CrossRef]
- Alam, M.J.; Kinnunen, T.; Kenny, P.; Ouellet, P.; O’Shaughnessy, D. Multi-taper MFCC Features for Speaker Verification using I-vectors. In Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, Waikoloa, HI, USA, 11–15 December 2011; pp. 547–552. [Google Scholar]
- Chen, L.; Lee, K.A.; Chng, E.; Ma, B.; Li, H.; Dai, L.-R. Content-aware local variability vector for speaker verification with short utterance. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 5485–5489. [Google Scholar]
- Zhu, Y.; Ko, T.; Snyder, D.; Mak, B.; Povey, D. Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, Hyderabad, India, 2–6 September 2018; pp. 3573–3577. [Google Scholar]
- Mobiny, A.; Najarian, M. Text-Independent Speaker Verification Using Long Short-Term Memory Networks. arXiv 2018, arXiv:1805.00604. [Google Scholar]
- Hourri, S.; Nikolov, N.S.; Kharroubi, J. Convolutional neural network vectors for speaker recognition. Int. J. Speech Technol. 2021, 24, 389–400. [Google Scholar] [CrossRef]
- Shahin, I.; Nassif, A.B.; Nemmour, N.; Elnagar, A.; Alhudhaif, A.; Polat, K. Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments. Neural Comput. Appl. 2021, 33, 16033–16055. [Google Scholar] [CrossRef]
- Mohammed, T.S.; Aljebory, K.M.; Abdul Rasheed, M.A.; Al-Ani, M.S.; Sagheer, A.M. Analysis of Methods and Techniques Used for Speaker Identification, Recognition, and Verification: A Study on Quarter-Century Research Outcomes. Iraqi J. Sci. 2021, 62, 3256–3281. [Google Scholar] [CrossRef]
- Chen, Y.H.; Lopez-Moreno, I.; Sainath, T.N.; Visontai, M.; Alvarez, R.; Parada, C. Locally-connected and convolutional neural networks for small footprint speaker recognition. In Proceedings of the Interspeech, Dresden, Germany, 6–10 September 2015. [Google Scholar]
- Variani, E.; Lei, X.; McDermott, E.; Moreno, I.L.; Gonzalez-Dominguez, J. Deep Neural Networks for Small Footprint Text-Dependent Speaker Verification. In Proceedings of the 2014 in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 4052–4056. [Google Scholar] [CrossRef]
- Heigold, G.; Moreno, I.; Bengio, S.; Shazeer, N. End-to-end text-dependent speaker verification. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 20–25 March 2016. [Google Scholar]
- Cao, H.; Cooper, D.G.; Keutmann, M.K.; Gur, R.C.; Nenkova, A.; Verma, R. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 2014, 5, 377–390. [Google Scholar] [CrossRef] [PubMed]
- Livingstone, S.R.; Russo, F.A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018, 13, e0196391. [Google Scholar] [CrossRef] [PubMed]
- Kumar, D.S.P. Feature Normalisation for Robust Speech Recognition. arXiv 2015, arXiv:abs/1507.0. [Google Scholar]
- Li, L.; Wang, D.; Zhang, Z.; Zheng, T.F. Deep Speaker Vectors for Semi Text-independent Speaker Verification. arXiv 2015, arXiv:1505.06427. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; pp. 18–25. [Google Scholar]
- Reynolds, D.A.; Rose, R.C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 1995, 3, 72–83. [Google Scholar] [CrossRef]
- Pulgar, F.J.; Charte, F.; Rivera, A.J.; del Jesus, M.J. AEkNN: An AutoEncoder kNN-Based Classifier With Built-in Dimensionality Reduction. Int. J. Comput. Intell. Syst. 2018, 12, 436. [Google Scholar] [CrossRef]
- Arce-Medina, E.; Paz-Paredes, J.I. Artificial neural network modeling techniques applied to the hydrodesulfurization process. Math. Comput. Model. 2009, 49, 207–214. [Google Scholar] [CrossRef]
- Saez, Y.; Baldominos, A.; Isasi, P. A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition. Sensors 2016, 17, 66. [Google Scholar] [CrossRef]
- Shahin, I. Emirati speaker verification based on HMMls, HMM2s, and HMM3s. In Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China, 6–10 November 2016; pp. 562–567. [Google Scholar]
DNN Model | Layers | #Layers | Units | Other Params. |
---|---|---|---|---|
CNN | Conv2d | 1 | 128 | Relu 1, kernel = 7, strides = 2 |
MaxPool2D | 1 | - | pool_size = 2, strides = 2 | |
Dense Dense (Output layer) | 1 | 128 24 | - SoftMax | |
LSTM | LSTM | 1 | 64 | Relu |
Dense Dense (Output layer) | 1 | 64 24 | - SoftMax | |
GRU | GRU | 1 | 64 | - |
Dense Dense (Output layer) | 1 | 64 24 | - SoftMax |
Equal Error Rate (EER) (%) Collected Emirati Dataset | ||||||||
---|---|---|---|---|---|---|---|---|
GMM EER AUC | KNN EER AUC | SVM EER AUC | ANN EER AUC | ivector EER AUC | CNN EER AUC | LSTM EER AUC | GRU EER AUC | |
Neutral | 1.43 0.99 | 19.00 0.16 | 9.00 0.09 | 10.00 0.09 | 8.55 0.97 | 12.83 0.93 | 9.13 0.95 | 8.91 0.96 |
Anger | 12.49 0.94 | 42.00 0.24 | 29.00 0.21 | 37.00 0.23 | 12.83 0.94 | 13.89 0.91 | 12.70 0.94 | 14.77 0.92 |
Happy | 5.32 0.98 | 35.00 0.23 | 21.00 0.17 | 23.00 0.18 | 10.1 0.95 | 14.86 0.92 | 11.64 0.94 | 12.79 0.94 |
Sad | 2.63 0.98 | 45.00 0.25 | 25.00 0.19 | 25.00 0.19 | 9.08 0.97 | 15.34 0.91 | 12.74 0.95 | 10.54 0.94 |
Fear | 3.70 0.99 | 45.00 0.25 | 24.00 0.18 | 23.00 0.18 | 9.18 0.97 | 13.89 0.91 | 12.26 0.94 | 11.77 0.95 |
Disgust | 2.27 0.99 | 29.00 0.20 | 15.00 0.13 | 16.00 0.13 | 10.1 0.96 | 16.58 0.91 | 10.11 0.95 | 11.66 0.94 |
Average | 4.64 0.97 | 35.83 0.22 | 20.5 0.16 | 22.33 0.16 | 9.97 0.96 | 14.56 0.92 | 11.43 0.94 | 11.74 0.94 |
Wilcoxon Test | ||||||||
---|---|---|---|---|---|---|---|---|
KNN | SVM | ANN | GMM | CNN | GRU | LSTM | ivector | |
Neutral | 0.003 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Anger | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Happy | 0.000 | 0.000 | 0.000 | 0.009 | 0.000 | 0.000 | ||
Sad | 0.000 | 0.000 | 0.000 | 0.000 | 0.005 | 0.000 | ||
Fear | 0.000 | 0.000 | 0.000 | 0.588 | 0.000 | 0.000 | ||
Disgust | 0.000 | 0.000 | 0.000 | 0.059 | 0.000 | 0.000 |
Equal Error Rate (EER) (%) CREMA | ||||||||
---|---|---|---|---|---|---|---|---|
GMM EER AUC | KNN EER AUC | SVM EER AUC | ANN EER AUC | ivector EER AUC | CNN EER AUC | LSTM EER AUC | GRU EER AUC | |
Neutral | 21.00 0.84 | 44.00 0.07 | 22.00 0.11 | 30.00 0.15 | 12.54 0.94 | 21.94 0.86 | 18.75 0.9 | 17.19 0.91 |
Anger | 35.00 0.70 | 50.00 0.02 | 40.00 0.08 | 47.00 0.05 | 25.28 0.80 | 38.54 0.66 | 36.25 0.69 | 40.62 0.65 |
Happy | 33.00 0.74 | 53.00 0.05 | 35.00 0.10 | 47.00 0.09 | 23.61 0.84 | 32.29 0.73 | 32.66 0.72 | 32.81 0.74 |
Sad | 29.00 0.76 | 54.00 0.10 | 32.00 0.16 | 47.00 0.17 | 16.66 0.89 | 32.33 0.74 | 21.88 0.86 | 22.03 0.86 |
Fear | 37.00 0.69 | 53.00 0.09 | 38.00 0.14 | 50.00 0.14 | 20.87 0.85 | 38.54 0.69 | 25.31 0.8 | 37.19 0.72 |
Disgust | 28.00 0.78 | 52.00 0.09 | 37.00 0.14 | 46.00 0.14 | 23.52 0.85 | 34.38 0.72 | 26.61 0.8 | 28.07 0.81 |
Average | 30.5 0.75 | 51 0.07 | 34 0.12 | 44.5 0.12 | 20.41 0.86 | 33.00 0.73 | 26.91 0.8 | 29.65 0.78 |
Wilcoxon Test | ||||||||
---|---|---|---|---|---|---|---|---|
KNN | SVM | ANN | GMM | CNN | GRU | LSTM | ivector | |
Neutral | 0.000 | 0.000 | 0.000 | 0.045 | 0.000 | 0.000 | ||
Anger | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Happy | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Sad | 0.000 | 0.000 | 0.000 | 0.814 | 0.000 | 0.000 | ||
Fear | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Disgust | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Equal Error Rate (EER) (%) RAVDESS Dataset | ||||||||
---|---|---|---|---|---|---|---|---|
GMM EER AUC | KNN EER AUC | SVM EER AUC | ANN EER AUC | ivector EER AUC | CNN EER AUC | LSTM EER AUC | GRU EER AUC | |
Neutral | 2.13 0.98 | 4.00 0.04 | 7.00 0.07 | 2.00 0.02 | 12.5 0.89 | 25.00 0.85 | 12.50 0.89 | 12.50 0.91 |
Anger | 23.40 0.81 | 62.00 0.24 | 52.00 0.25 | 61.00 0.24 | 28.65 0.72 | 36.98 0.74 | 25.00 0.74 | 43.23 0.63 |
Happy | 27.13 0.83 | 46.00 0.25 | 48.00 0.25 | 47.00 0.25 | 28.13 0.79 | 28.12 0.78 | 24.48 0.80 | 30.73 0.69 |
Sad | 17.02 0.91 | 40.00 0.24 | 40.00 0.24 | 39.00 0.24 | 21.88 0.77 | 21.88 0.85 | 25.00 0.83 | 18.75 0.88 |
Fear | 20.48 0.83 | 63.00 0.23 | 57.00 0.25 | 59.00 0.24 | 28.13 0.75 | 31.77 0.71 | 25.52 0.81 | 31.25 0.68 |
Disgust | 22.40 0.80 | 64.00 0.23 | 65.00 0.23 | 60.00 0.24 | 19.79 0.89 | 30.21 0.82 | 31.25 0.77 | 37.50 0.70 |
Average | 18.76 0.86 | 46.50 0.21 | 44.83 0.21 | 44.67 0.20 | 23.18 0.80 | 28.99 0.79 | 23.96 0.81 | 28.99 0.75 |
Wilcoxon Test | ||||||||
---|---|---|---|---|---|---|---|---|
KNN | SVM | ANN | GMM | CNN | GRU | LSTM | ivector | |
Neutral | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Anger | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Happy | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Sad | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Fear | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ||
Disgust | 0.000 | 0.000 | 0.000 | 0.004 | 0.586 | 0.000 |
Models | Neutral |
---|---|
HMM1, HMM2, HMM3 [45] | 11.5, 9.6, 4.9 |
GMM [our winning model] | 1.43 |
Models | Emirati | RAVDESS | CREMA | |
---|---|---|---|---|
Classical Classifiers | GMM | 94.530 | 13.149 | 66.375 |
KNN | 35.365 | 3.446 | 11.898 | |
SVM | 6.949 | 1.153 | 5.161 | |
ANN | 19.231 | 2.455 | 7.203 | |
Deep Classifiers | CNN | 0.963 | 1.482 | 0.767 |
LSTM | 1.054 | 2.058 | 2.269 | |
GRU | 0.980 | 1.526 | 2.203 | |
ivector | 90.850 | 6.4542 | 34.6124 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nassif, A.B.; Shahin, I.; Lataifeh, M.; Elnagar, A.; Nemmour, N. Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments. Information 2022, 13, 456. https://doi.org/10.3390/info13100456
Nassif AB, Shahin I, Lataifeh M, Elnagar A, Nemmour N. Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments. Information. 2022; 13(10):456. https://doi.org/10.3390/info13100456
Chicago/Turabian StyleNassif, Ali Bou, Ismail Shahin, Mohammed Lataifeh, Ashraf Elnagar, and Nawel Nemmour. 2022. "Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments" Information 13, no. 10: 456. https://doi.org/10.3390/info13100456
APA StyleNassif, A. B., Shahin, I., Lataifeh, M., Elnagar, A., & Nemmour, N. (2022). Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments. Information, 13(10), 456. https://doi.org/10.3390/info13100456