Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition

SG Koolagudi, YVS Murthy, SP Bhaskar - International Journal of Speech …, 2018 - Springer
International Journal of Speech Technology, 2018Springer
In this paper, the process of selecting a classifier based on the properties of dataset is
designed since it is very difficult to experiment the data on n—number of classifiers. As a
case study speech emotion recognition is considered. Different combinations of spectral and
prosodic features relevant to emotions are explored. The best subset of the chosen set of
features is recommended for each of the classifiers based on the properties of chosen
dataset. Various statistical tests have been used to estimate the properties of dataset. The …
Abstract
In this paper, the process of selecting a classifier based on the properties of dataset is designed since it is very difficult to experiment the data on n—number of classifiers. As a case study speech emotion recognition is considered. Different combinations of spectral and prosodic features relevant to emotions are explored. The best subset of the chosen set of features is recommended for each of the classifiers based on the properties of chosen dataset. Various statistical tests have been used to estimate the properties of dataset. The nature of dataset gives an idea to select the relevant classifier. To make it more precise, three other clustering and classification techniques such as K-means clustering, vector quantization and artificial neural networks are used for experimentation and results are compared with the selected classifier. Prosodic features like pitch, intensity, jitter, shimmer, spectral features such as mel frequency cepstral coefficients (MFCCs) and formants are considered in this work. Statistical parameters of prosody such as minimum, maximum, mean () and standard deviation () are extracted from speech and combined with basic spectral (MFCCs) features to get better performance. Five basic emotions namely anger, fear, happiness, neutral and sadness are considered. For analysing the performance of different datasets on different classifiers, content and speaker independent emotional data is used, collected from Telugu movies. Mean opinion score of fifty users is collected to label the emotional data. To make it more accurate, one of the benchmark IIT-Kharagpur emotional database is used to generalize the conclusions.
Springer