Multimodal Database of Emotional Speech, Video and Gestures

Tomasz Sapiński¹⁸,
Dorota Kamińska¹⁸,
Adam Pelikant¹⁸,
Cagri Ozcinar¹⁹,
Egils Avots²⁰ &
…
Gholamreza Anbarjafari²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11188))

Included in the following conference series:

International Conference on Pattern Recognition

1795 Accesses
18 Citations

Abstract

People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Emotion Recognition Using Multimodalities

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Article 04 January 2021

References

Baltrušaitis, T., et al.: Real-time inference of mental states from facial expressions and upper body gestures. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), pp. 909–914. IEEE (2011)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Camras, L.A., Oster, H., Campos, J.J., Miyake, K., Bradshaw, D.: Japanese and american infants’ responses to arm restraint. Dev. Psychol. 28(4), 578 (1992)
Article Google Scholar
Daneshmand, M., et al.: 3D scanning: a comprehensive survey. arXiv preprint arXiv:1801.08863 (2018)
Douglas-Cowie, E., Cowie, R., Schröder, M.: A new emotion database: considerations, sources and scope. In: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)
Google Scholar
Efron, D.: Gesture and environment (1941)
Google Scholar
Ekman, P.: Universal and cultural differences in facial expression of emotion. Nebr. Sym. Motiv. 19, 207–283 (1971)
Google Scholar
Gavrilescu, M.: Recognizing emotions from videos by studying facial expressions, body postures and hand gestures. In: 2015 23rd Telecommunications Forum Telfor (TELFOR), pp. 720–723. IEEE (2015)
Google Scholar
Gelder, B.D.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. B: Biol. Sci. 364(364), 3475–3484 (2009)
Article Google Scholar
Goswami, G., Vatsa, M., Singh, R.: RGB-D face recognition with texture and attribute features. IEEE Trans. Inf. Forensics Secur. 9(10), 1629–1640 (2014)
Article Google Scholar
Greco, A., Valenza, G., Citi, L., Scilingo, E.P.: Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sens. J. 17(3), 716–725 (2017)
Article Google Scholar
Gupta, R., Khomami Abadi, M., Cárdenes Cabré, J.A., Morreale, F., Falk, T.H., Sebe, N.: A quality adaptive multimodal affect recognition system for user-centric multimedia indexing. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 317–320. ACM (2016)
Google Scholar
Haamer, R.E., et al.: Changes in facial expression as biometric: a database and benchmarks of identification. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 621–628. IEEE (2018)
Google Scholar
Haque, M.A., et al.: Deep multimodal pain recognition: a database and comparison of spatio-temporal visual modalities. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 250–257. IEEE (2018)
Google Scholar
Hg, R., Jasek, P., Rofidal, C., Nasrollahi, K., Moeslund, T.B., Tranchet, G.: An RGB-D database using microsoft’s kinect for windows for face detection. In: 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems (SITIS), pp. 42–46. IEEE (2012)
Google Scholar
Jenke, R., Peer, A., Buss, M.: Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2014)
Article Google Scholar
Jerritta, S., Murugappan, M., Wan, K., Yaacob, S.: Emotion recognition from facial EMG signals using higher order statistics and principal component analysis. J. Chin. Inst. Eng. 37(3), 385–394 (2014)
Article Google Scholar
Kamińska, D., Sapiński, T., Anbarjafari, G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. 2017(1), 3 (2017)
Article Google Scholar
Kendon, A.: The study of gesture: some remarks on its history. In: Deely, J.N., Lenhart, M.D. (eds.) Semiotics 1981, pp. 153–164. Springer, Heidelberg (1983). https://doi.org/10.1007/978-1-4615-9328-7_15
Chapter Google Scholar
Kiforenko, L., Kraft, D.: Emotion recognition through body language using RGB-D sensor. Vision Theory and Applications Computer Vision Theory and Applications, pp. 398–405. SCITEPRESS Digital Library (2016) In: 11th International Conference on Computer Vision Theory and Applications Computer Vision Theory and Applications, pp. 398–405. SCITEPRESS Digital Library (2016)
Google Scholar
Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognit. 61, 610–628 (2017)
Article Google Scholar
Lüsi, I., Escarela, S., Anbarjafari, G.: SASE: RGB-depth database for human head pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 325–336. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_26
Chapter Google Scholar
Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect database for face recognition. IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2014)
Article Google Scholar
Noroozi, F., Corneanu, C.A., Kamińska, D., Sapiński, T., Escalera, S., Anbarjafari, G.: Survey on emotional body gesture recognition. arXiv preprint arXiv:1801.07481 (2018)
Noroozi, F., Sapiński, T., Kamińska, D., Anbarjafari, G.: Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 20(2), 239–246 (2017)
Article Google Scholar
Pease, B., Pease, A.: The Definitive Book of Body Language. Bantam, New York City (2004)
MATH Google Scholar
Pławiak, P., Sośnicki, T., Niedźwiecki, M., Tabor, Z., Rzecki, K.: Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms. IEEE Trans. Ind. Inform. 12(3), 1104–1113 (2016)
Article Google Scholar
Plutchik, R.: The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)
Article Google Scholar
Psaltis, A., et al.: Multimodal affective state recognition in serious games applications. In: 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 435–439. IEEE (2016)
Google Scholar
Ranganathan, H., Chakraborty, S., Panchanathan, S.: Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016)
Google Scholar
Russell, J., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11, 273–294 (1977)
Article Google Scholar
Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., Akarun, L.: Bosphorus database for 3D face analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) BioID 2008. LNCS, vol. 5372, pp. 47–56. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89991-4_6
Chapter Google Scholar
Wan, J., et al.: Results and analysis of ChaLearn lap multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges. In: ChaLearn LAP, Action, Gesture, and Emotion Recognition Workshop and Competitions: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions, ICCV, vol. 4 (2017)
Google Scholar
Yin, L., Chen, X., Sun, Y., Worm, T., Reale, M.: A high-resolution 3D dynamic facial expression database. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–6. IEEE (2008)
Google Scholar
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic face and gesture recognition, FGR 2006, pp. 211–216. IEEE (2006)
Google Scholar
Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
Article MathSciNet Google Scholar
Zhang, X., et al.: BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
Article Google Scholar

Download references

Acknowledgement

The authors would like to thank Michał Wasażnik (psychologist), who participated in experimental protocol creation. This work is supported Estonian Research Council Grant (PUT638), the Scientific and Technological Research Council of Turkey (TÜBİTAK) (Proje 1001 - 116E097), Estonian-Polish Joint Research Project, the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan XP GPU used for this research.

Author information

Authors and Affiliations

Inst of Mechatronics and Info Sys, Lodz University of Technology, Łódź, Poland
Tomasz Sapiński, Dorota Kamińska & Adam Pelikant
Computer Science and Statistics, Trinity College Dublin, Dublin 2, Ireland
Cagri Ozcinar
iCV Research Lab, Institute of Technology, University of Tartu, Tartu, Estonia
Egils Avots & Gholamreza Anbarjafari

Authors

Tomasz Sapiński
View author publications
You can also search for this author in PubMed Google Scholar
Dorota Kamińska
View author publications
You can also search for this author in PubMed Google Scholar
Adam Pelikant
View author publications
You can also search for this author in PubMed Google Scholar
Cagri Ozcinar
View author publications
You can also search for this author in PubMed Google Scholar
Egils Avots
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Anbarjafari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dorota Kamińska .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
University of Adelaide, North Terrace, SA, Australia
David Suter
City College of New York, New York, NY, USA
Yingli Tian
University of Victoria, Victoria, BC, Canada
Alexandra Branzan Albu
University of La Rochelle, La Rochelle, France
Nicolas Sidère
National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Hugo Jair Escalante

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sapiński, T., Kamińska, D., Pelikant, A., Ozcinar, C., Avots, E., Anbarjafari, G. (2019). Multimodal Database of Emotional Speech, Video and Gestures. In: Zhang, Z., Suter, D., Tian, Y., Branzan Albu, A., Sidère, N., Jair Escalante, H. (eds) Pattern Recognition and Information Forensics. ICPR 2018. Lecture Notes in Computer Science(), vol 11188. Springer, Cham. https://doi.org/10.1007/978-3-030-05792-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-05792-3_15
Published: 19 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05791-6
Online ISBN: 978-3-030-05792-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Database of Emotional Speech, Video and Gestures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Emotion Recognition Using Multimodalities

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multimodal Database of Emotional Speech, Video and Gestures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Emotion Recognition Using Multimodalities

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation