Emotion recognition from speech: putting ASR in the loop

B Schuller, A Batliner, S Steidl… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
2009 IEEE International Conference on Acoustics, Speech and Signal …, 2009ieeexplore.ieee.org
This paper investigates the automatic recognition of emotion from spoken words by vector
space modeling vs. string kernels which have not been investigated in this respect, yet.
Apart from the spoken content directly, we integrate part-of-speech and higher semantic
tagging in our analyses. As opposed to most works in the field, we evaluate the performance
with an ASR engine in the loop. Extensive experiments are run on the FAU Aibo Emotion
Corpus of 4 k spontaneous emotional child-robot interactions and show surprisingly low …
This paper investigates the automatic recognition of emotion from spoken words by vector space modeling vs. string kernels which have not been investigated in this respect, yet. Apart from the spoken content directly, we integrate part-of-speech and higher semantic tagging in our analyses. As opposed to most works in the field, we evaluate the performance with an ASR engine in the loop. Extensive experiments are run on the FAU Aibo Emotion Corpus of 4 k spontaneous emotional child-robot interactions and show surprisingly low performance degradation with real ASR over transcription-based emotion recognition. In the result, bag of words dominate over all other modeling forms based on the spoken content.
ieeexplore.ieee.org