Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The amalgamation of wavelet packet information gain entropy tuned source and system parameters for improved speech emotion recognition

Published: 01 April 2023 Publication History

Highlights

Prominent Wavelet Packet coefficients are extracted from emotional voice samples using Eigenvalue decomposition.
Several cepstrum-based feature vectors are derived from the prominent WP coefficients.
A three-stage feature selection algorithm by integrating the WP decomposition, statistical, and Information Gain (IG) feature ranking algorithm based on high IG entropy is proposed.
Different Intelligent amalgamations of WP-based source and system features have been explored for more available information.

Abstract

This paper proposes a three-stage feature selection algorithm by exploring the wavelet packet (WP) decomposition, statistical, and Information Gain (IG) feature ranking algorithm based on high IG entropy for the automatic classification of speech emotions (SE). Effective frame-level vocal tract systems and the excitation source parameters are initially extracted from a few significant WP sub-bands containing emotionally relevant information based on Eigenvalue decomposition. Further, Several intelligent amalgamations of the derived optimal feature vectors are formed for improved performance in a low-dimensional feature space. The fundamental argument is that in case the identification errors of any system subjected to individual feature streams transpire at separate points, there exists a probability that the inclusion of complementary information can nullify a few of these inaccuracies by increasing the available information. The models of the Cost-Sensitive-Decision Tree (CS-DT), Support Vector Machine (SVM), and Decision Tree (DT) have been validated and tested with the proposed setup for their efficacy. Results indicate the superiority of the proposed algorithms compared to other published articles cited in the literature with the CS-DT outperforming others. The proposed low-dimensional amalgamation vectors have witnessed more than 20% improvement in recognition performance with greater speed of response, savings in cost, and lower F- rank hence is a significant achievement in this direction.
The link to compute the Hurst parameter for the feature amalgamation is available at (https://www.mathworks.com/matlabcentral/fileexchange/9842-hurst-exponent).

Graphical abstract

Display Omitted

References

[1]
B.J. Abbaschian, D. Sierra-Sosa, A. Elmaghraby, Deep learning techniques for speech emotion recognition, from databases to models, Sensors 21 (4) (2021) 1249.
[2]
L. Abdel-Hamid, Egyptian Arabic speech emotion recognition using prosodic, spectral, and wavelet features, Speech Commun. 122 (2020) 19–30.
[3]
P. Abry, D. Veitch, Wavelet analysis of long-range-dependent traffic, IEEE Trans. Inf. Theory 44 (1) (1998) 2–15.
[4]
Aggarwal, et al., Two-way feature extraction for speech emotion recognition using deep learning, Sensors 22 (6) (2022) 2378.
[5]
T.A. Alhaj, M.M. Siraj, A. Zainal, H.T. Elshoush, F. Elhaj, Feature selection using information gain for improved structural-based alert correlation, PLoS One 11 (11) (2016).
[6]
N. Almaadeed, A. Aggoun, A. Amira, Text-independent speaker identification using vowel formants, J. Signal Process. Syst. 82 (2016) 345–356.
[7]
A. Amjad, L. Khan, H.T. Chang, Effect on speech emotion classification of a feature selection approach using a convolutional neural network, PeerJ Comput. Sci. 7 (2021) e766.
[8]
C.N. Anagnostopoulos, T. Iliou, I. Giannoukos, Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011, Artif. Intell. Rev. 43 (2) (2015) 155–177.
[9]
J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient, Appl. Acoust. 179 (2021).
[10]
B.T. Atmaja, A. Sasou, M. Akagi, Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion, Speech Commun. 140 (2022) 11–28.
[11]
B. Azhagusundari, A.S. Thanamani, Feature selection based on information gain, Int. J. Innov. Technol. Explor. Eng. 2 (2) (2013) 18–21.
[12]
A.C. Bahnsen, D. Aouada, B. Ottersten, Example-dependent cost-sensitive decision trees, Expert Syst. Appl. 42 (19) (2015) 6609–6619.
[13]
J. Chatterjee, V. Mukesh, H.H. Hsu, G. Vyas, Z. Liu, Speech emotion recognition using cross-correlation and acoustic features, in: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), IEEE, 2018, pp. 243–249.
[14]
Q. Chen, G. Huang, A novel dual attention-based BLSTM with hybrid features in speech emotion recognition, Eng. Appl. Artif. Intell. 102 (2021).
[15]
F. Daneshfar, S.J. Kabudian, A. Neekabadi, Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier, Appl. Acoust. 166 (2020).
[16]
K. Delac, M. Grgic, S. Grgic, Independent comparative study of PCA, ICA, and LDA on the FERET data set, Int. J. Imaging Syst. Technol. 15 (5) (2005) 252–260.
[17]
M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases, Patt. Recognit. 44 (3) (2011) 572–587.
[18]
M.S. Fahad, A. Deepak, G. Pradhan, J. Yadav, DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features, Circuit. Syst. Signal Process. 40 (2021) 466–489.
[19]
M.A. Febriantono, S.H. Pramono, R. Rahmadwati, G. Naghdy, Classification of multiclass imbalanced data using cost-sensitive decision tree C5. 0, IAES Int. J. Artif. Intell. 9 (1) (2020) 65.
[20]
L. Guo, L. Wang, J. Dang, E.S. Chng, S. Nakagawa, Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition, Speech Commun. 136 (2022) 118–127.
[21]
M. Gupta, S.S. Bharti, S. Agarwal, Emotion recognition from speech using wavelet packet transform and prosodic features, J. Intell. Fuzzy Syst. 35 (2) (2018) 1541–1553.
[22]
S. Hamsa, I. Shahin, Y. Iraqi, N. Werghi, Emotion recognition from speech using wavelet packet transform cochlear filter bank and random forest classifier, IEEE Access 8 (2020) 96994–97006.
[23]
N. Huang, Feature selection of power quality disturbance Signals with an entropy-importance-based random forest, Entropy 18 (2) (2016).
[24]
S.B. Jabeur, A. Sadaaoui, A. Sghaier, R. Aloui, Machine learning models and cost-sensitive decision trees for bond rating prediction, J. Oper. Res. Soc. 71 (8) (2020) 1161–1179.
[25]
A. Jacob, Modelling speech emotion recognition using logistic regression and decision trees, Int. J. Speech Technol. 20 (4) (2017) 897–905.
[26]
L. Kerkeni, Y. Serrestou, K. Raoof, M. Mbarki, M.A. Mahjoub, C. Cleder, Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO, Speech Commun. 114 (2019) 22–35.
[27]
P. Khanna, M. Sasi Kumar, Application of vector quantization in emotion recognition from human speech, in: International conference on information intelligence, systems, technology and management, Springer, Berlin, Heidelberg, 2011, pp. 118–125.
[28]
S.G. Koolagudi, Y.V. Murthy, S.P. Bhaskar, Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition, Int. J. Speech Technol. 21 (1) (2018) 167–183.
[29]
F. Makhmudov, A. Kutlimuratov, F. Akhmedov, M.S. Abdallah, Y.I. Cho, Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders, Electronics 11 (23) (2022) 4047.
[30]
S. Mao, D. Tao, G. Zhang, P.C. Ching, T. Lee, Revisiting hidden Markov models for speech emotion recognition, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, 2019, pp. 6715–6719.
[31]
A. Marik, S. Chattopadhyay, P.K. Singh, A hybrid deep feature selection framework for emotion recognition from human speeches, Multimed. Tool. Applica. (2022) 1–27.
[32]
H. Meng, T. Yan, H. Wei, X. Ji, Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neural networks, Bull. Pol. Acad. Sci. Techn. Sci. 69 (1) (2021).
[33]
M.N. Mohanty, H.K. Palo, Segment-based emotion recognition using combined reduced features, Int. J. Speech Technol. 22 (4) (2019) 865–884.
[34]
M.N. Mohanty, H.K. Palo, Child emotion recognition using probabilistic neural network with effective features, Measurement 152 (2020).
[35]
P. Navarrete, J. Ruiz-del-Solar, Analysis and comparison of eigenspace-based face recognition approaches, Int. J. Pattern Recognit. Artif. Intell. 16 (07) (2002) 817–830.
[36]
H.K. Palo, M.N. Mohanty, Wavelet based feature combination for recognition of emotions, Ain shams Eng. J. 9 (4) (2018) 1799–1806.
[37]
H.K. Palo, M.N. Mohanty, M. Chandra, Efficient feature combination techniques for emotional speech classification, Int. J. Speech Technol. 19 (1) (2016) 135–150.
[38]
C. Quan, B. Zhang, X. Sun, F. Ren, A combined Cepstral distance method for emotional speech recognition, Int. J. Adv. Rob. Syst. 14 (4) (2017).
[39]
K.S. Rao, S.G. Koolagudi, Robust Emotion Recognition using Spectral and Prosodic Features, Springer Briefs in Speech Technology, 2013, pp. 17–46.
[40]
Y. Sahin, S. Bulkan, E. Duman, A cost-sensitive decision tree approach for fraud detection, Expert Syst. Appl. 40 (15) (2013) 5916–5923.
[41]
P. Singh, R. Srivastava, K.P.S. Rana, V. Kumar, A multimodal hierarchical approach to speech emotion recognition from audio and text, Knowl.-Base. Syst. 229 (2021).
[42]
L. Sun, S. Fu, F. Wang, Decision tree SVM model with Fisher feature selection for speech emotion recognition, EURASIP J. Audio, Speech Music Process. 2019 (1) (2019) 1–14.
[43]
D. Tanko, S. Dogan, F.B. Demir, M. Baygin, S.E. Sahin, T. Tuncer, Shoelace pattern-based speech emotion recognition of the lecturers in distance education: ShoePat23, Appl. Acoust. 190 (2022).
[44]
R. Thirumuru, K. Gurugubelli, A.K. Vuppala, Novel feature representation using single frequency filtering and nonlinear energy operator for speech emotion recognition, Digit. Signal Process. 120 (2022).
[45]
T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques, Knowl.-Base. Syst. 211 (2021).
[46]
A.A. Viji, J. Jasper, T. Latha, Efficient emotion based automatic speech recognition using optimal deep learning approach, Optik (2022).
[47]
K. Wang, G. Su, L. Liu, S. Wang, Wavelet packet analysis for speaker-independent emotion recognition, Neurocomputing 398 (2020) 257–264.
[48]
X. Xu, J. Deng, W. Zheng, L. Zhao, B. Schuller, Dimensionality reduction for speech emotion features by multiscale kernels, in: Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[49]
S. Yildirim, Y. Kaya, F. Kılıç, A modified feature selection method based on metaheuristic algorithms for speech emotion recognition, Appl. Acoust. 173 (2021).
[50]
L. Zão, D. Cavalcante, R. Coelho, Time-frequency feature and AMS-GMM mask for acoustic emotion classification, IEEE Signal Process. Lett. 21 (5) (2014) 620–624.
[51]
K. Zhou, B. Sisman, R. Liu, H. Li, Emotional voice conversion: Theory, databases, and ESD, Speech Commun. 137 (2022) 1–18.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Speech Communication
Speech Communication  Volume 149, Issue C
Apr 2023
108 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 April 2023

Author Tags

  1. Speech emotion recognition
  2. Wavelet packet decomposition
  3. Source and system parameters
  4. Information gain
  5. Cost savings
  6. Friedman ranking

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media