Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques
Knowledge-Based Systems, 2021•Elsevier
Speech emotion recognition is one of the challenging research issues in the knowledge-
based system and various methods have been recommended to reach high classification
capability. In order to achieve high classification performance in speech emotion
recognition, a nonlinear multi-level feature generation model is presented by using
cryptographic structure. The novelty of this work is the use of cryptographic structure called
shuffle box for feature generation and iterative neighborhood component analysis to select …
based system and various methods have been recommended to reach high classification
capability. In order to achieve high classification performance in speech emotion
recognition, a nonlinear multi-level feature generation model is presented by using
cryptographic structure. The novelty of this work is the use of cryptographic structure called
shuffle box for feature generation and iterative neighborhood component analysis to select …
Abstract
Speech emotion recognition is one of the challenging research issues in the knowledge-based system and various methods have been recommended to reach high classification capability. In order to achieve high classification performance in speech emotion recognition, a nonlinear multi-level feature generation model is presented by using cryptographic structure. The novelty of this work is the use of cryptographic structure called shuffle box for feature generation and iterative neighborhood component analysis to select the features. The proposed method has three main stages: (i) multi-level feature generation using Tunable Q wavelet transform (TQWT), (ii) twine shuffle pattern (twine-shuf-pat) for feature generation, and (iii) discriminative features are selected using iterative neighborhood component analysis (INCA) and classified. The TQWT is a multi-level wavelet transformation method used to generate high-level, medium-level, and low-level wavelet coefficients. The proposed twine-shuf-pat technique is used to extract the features from the decomposed wavelet coefficients. INCA feature selector is employed to select the clinically significant features. The performance of the obtained model is validated using four speech emotion public databases (RAVDESS Speech, Emo-DB (Berlin), SAVEE, and EMOVO). Our developed twine-shuf-pat and INCA based method yielded 87.43%, 90.09%, 84.79%, and 79.08% classification accuracies using RAVDESS, Emo-DB (Berlin), SAVEE and EMOVO corpora respectively with 10-fold cross-validation strategy. A mixed database is created from four public speech emotion databases which yielded 80.05% classification accuracy. Our obtained speech emotion model is ready to be tested with huge database and can be used in healthcare applications.
Elsevier