Long short term memory recurrent neural network based multimodal dimensional emotion recognition
Proceedings of the 5th international workshop on audio/visual emotion challenge, 2015•dl.acm.org
This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+ EC2015),
whose goal is to predict the continuous values of the emotion dimensions arousal and
valence from audio, visual and physiology modalities. The state of art classifier for
dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is
utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated
for dimensional emotion recognition problem. The first one is ε-insensitive loss is utilized as …
whose goal is to predict the continuous values of the emotion dimensions arousal and
valence from audio, visual and physiology modalities. The state of art classifier for
dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is
utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated
for dimensional emotion recognition problem. The first one is ε-insensitive loss is utilized as …
This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.