research-article

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

Authors:

Kristina Gordon,

John StankovicAuthors Info & Claims

ACM Transactions on Computing for Healthcare (HEALTH), Volume 3, Issue 2

Article No.: 15, Pages 1 - 22

https://doi.org/10.1145/3492300

Published: 20 December 2021 Publication History

Abstract

The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.

References

[1]

[n.d.]. dynaEdge DE-100. Retrieved from https://asia.dynabook.com/laptop/dynaedge-de100/overview.php.

[2]

Starlet Ben Alex, Ben P. Babu, and Leena Mary. 2018. Utterance and syllable level prosodic features for automatic emotion recognition. In IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 31–35.

[3]

Halis Altun and Gökhan Polat. 2007. New frameworks to boost feature selection algorithms in emotion detection for improved human-computer interaction. In International Symposium on Brain, Vision, and Artificial Intelligence. Springer, 533–541.

Digital Library

[4]

Kiavash Bahreini, Rob Nadolski, and Wim Westera. 2016. Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21, 5 (2016), 1367–1386.

[5]

Pablo Barros and Stefan Wermter. 2016. Developing crossmodal expression recognition based on a deep neural model. Adapt.Behav. 24, 5 (2016), 373–396.

Digital Library

[6]

Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, and Ondrej Miksik. 2018. Multi-modal sequence fusion via recursive attention for emotion recognition. In 22nd Conference on Computational Natural Language Learning. 251–259.

[7]

Paul N. Bennett and Nam Nguyen. 2009. Refined experts: Improving classification in large taxonomies. In 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 11–18.

Digital Library

[8]

Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In 9th European Conference on Speech Communication and Technology.

[9]

Ru Ying Cai, Amanda L. Richdale, Cheryl Dissanayake, and Mirko Uljarević. 2018. Brief report: Inter-relationship between emotion regulation, intolerance of uncertainty, anxiety, and depression in youth with autism spectrum disorder. J. Autism Devel. Disord. 48, 1 (2018), 316–325.

[10]

Houwei Cao, David G. Cooper, Michael K. Keutmann, Ruben C. Gur, Ani Nenkova, and Ragini Verma. 2014. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5, 4 (2014), 377–390.

[11]

José Carlos Castillo, Álvaro Castro-González, Fernándo Alonso-Martín, Antonio Fernández-Caballero, and Miguel Ángel Salichs. 2018. Emotion detection and regulation from personal assistant robot in smart environment. In Personal Assistants: Emerging Computational Technologies. Springer, 179–195.

[12]

Ling Cen, Fei Wu, Zhu Liang Yu, and Fengye Hu. 2016. A real-time speech emotion recognition system and its application in online learning. In Emotions, Technology, Design, and Learning. Elsevier, 27–46.

[13]

Rajdeep Chatterjee, Saptarshi Mazumdar, R. Simon Sherratt, Rohit Halder, Tanmoy Maitra, and Debasis Giri. 2021. Real-time speech emotion analysis for smart home assistants.IEEE Trans. Consum. Electron. 67, 1 (2021), 68–76.

[14]

Zeya Chen, Mohsin Y. Ahmed, Asif Salekin, and John A. Stankovic. 2019. ARASID: Artificial reverberation-adjusted indoor speaker identification dealing with variable distances. In International Conference on Embedded Wireless Systems and Networks (EWSN’19). Junction Publishing, 154–165. Retrieved from http://dl.acm.org/citation.cfm?id=3324320.3324339.

Digital Library

[15]

Ming Cheng, Andrew Friesen, and Olalekan Adekola. 2019. Using emotion regulation to cope with challenges: A study of Chinese students in the United Kingdom. Cambr. J. Educ. 49, 2 (2019), 133–145.

[16]

Akash Roy Choudhury, Anik Ghosh, Rahul Pandey, and Subhas Barman. 2018. Emotion recognition from speech signals using excitation source and spectral features. In IEEE Applied Signal Processing Conference (ASPCON’18). IEEE, 257–261.

[17]

Taner Danisman and Adil Alpkocak. 2008. Emotion classification of audio signals using ensemble of support vector machines. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems. Springer, 205–216.

Digital Library

[18]

Dragos Datcu and Léon J. M. Rothkrantz. 2005. Facial expression recognition with relevance vector machines. In IEEE International Conference on Multimedia and Expo. IEEE, 193–196.

[19]

Jun Deng, Xinzhou Xu, Zixing Zhang, Sascha Frühholz, and Björn Schuller. 2017. Universum autoencoder-based domain adaptation for speech emotion recognition. IEEE Sig. Proc. Lett. 24, 4 (2017), 500–504.

[20]

Robert F. Dickerson, Enamul Hoque, Philip Asare, Shahriar Nirjon, and John A. Stankovic. 2014. Resonate: Reverberation environment simulation for improved classification of speech models. In 13th International Symposium on Information Processing in Sensor Networks. IEEE, 107–117.

Digital Library

[21]

Kate Dupuis and M. Kathleen Pichora-Fuller. 2010. Toronto Emotional Speech Set (TESS). University of Toronto, Psychology Department.

[22]

Florian Eyben, Klaus R. Scherer, Björn W. Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7, 2 (2015), 190–202.

Digital Library

[23]

Haytham M. Fayek, Margaret Lech, and Lawrence Cavedon. 2015. Towards real-time speech emotion recognition using deep neural networks. In 9th International Conference on Signal Processing and Communication Systems (ICSPCS’15). IEEE, 1–5.

[24]

V. Fernandes, L. Mascarehnas, C. Mendonca, A. Johnson, and R. Mishra. 2018. Speech emotion recognition using mel frequency cepstral coefficient and SVM classifier. In International Conference on System Modeling & Advancement in Research Trends (SMART’18). IEEE, 200–204.

[25]

Antonio Fernández-Caballero, Arturo Martínez-Rodrigo, José Manuel Pastor, José Carlos Castillo, Elena Lozano-Monasor, María T. López, Roberto Zangróniz, José Miguel Latorre, and Alicia Fernández-Sotos. 2016. Smart environment architecture for emotion detection and regulation. J. Biomed. Inf. 64 (2016), 55–73.

Digital Library

[26]

Ye Gao, Meiyi Ma, Kristina Gordon, Karen Rose, Hongning Wang, and John Stankovic. 2020. A monitoring, modeling, and interactive recommendation system for in-home caregivers: Demo abstract. In 18th Conference on Embedded Networked Sensor Systems. 587–588.

Digital Library

[27]

Joseph Gaugler, Bryan James, Tricia Johnson, Allison Marin, and Jennifer Weuve. 2019. 2019 Alzheimer’s disease facts and figures. Alzh. Dement. 15, 3 (2019), 321–387.

[28]

Esam Ghaleb, Mirela Popa, and Stylianos Asteriadis. 2019. Multimodal and temporal perception of audio-visual cues for emotion recognition. In 8th International Conference on Affective Computing & Intelligent Interaction (ACII’19).

[29]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.

Digital Library

[30]

Jacek Grekow. 2018. Music emotion maps in the arousal-valence space. In From Content-based Music Emotion Recognition to Emotion Maps of Musical Pieces. Springer, 95–106.

Digital Library

[31]

James J. Gross and Ricardo F. Muñoz. 1995. Emotion regulation and mental health. Clin. Psychol.: Sci. Pract. 2, 2 (1995), 151–164.

[32]

James J. Gross, Helen Uusberg, and Andero Uusberg. 2019. Mental illness and well-being: An affect regulation perspective. World Psychiat. 18, 2 (2019), 130–139.

[33]

S. Haq and P. J. B. Jackson. 2010. Machine Audition: Principles, Algorithms and Systems. IGI Global, Hershey PA, 398–423.

Digital Library

[34]

Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).

[35]

Che-Wei Huang and Shrikanth Narayanan. 2018. Stochastic shake-shake regularization for affective learning from speech. In Interspeech Conference. 3658–3662.

[36]

Amin Jalili, Sadid Sahami, Chong-Yung Chi, and Rassoul Amirfattahi. 2018. Speech emotion recognition using cyclostationary spectral analysis. In IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP’18). IEEE, 1–6.

[37]

Margaret Lech, Melissa Stolar, Christopher Best, and Robert Bolia. 2020. Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding. Front. Comput. Sci. 2 (2020), 14.

[38]

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In International Conference on Advances in Neural Information Processing Systems. 7167–7177.

Digital Library

[39]

Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. 2017. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).

[40]

Steven R. Livingstone and Frank A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13, 5 (2018), e0196391.

[41]

Marko Lugger and Bin Yang. 2007. An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th International Congress of Phonetic Sciences.

[42]

Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. National Institute of Science of India.

[43]

Leandro Y. Mano. 2018. Emotional condition in the Health Smart Homes environment: Emotion recognition using ensemble of classifiers. In Innovations in Intelligent Systems and Applications (INISTA’18). IEEE, 1–8.

[44]

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference (EUSIPCO’16). IEEE, 1128–1132.

[45]

Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).

[46]

Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, and Gholamreza Anbarjafari. 2017. Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10, 1 (2017), 60–75.

Digital Library

[47]

Asif Salekin, Zeya Chen, Mohsin Y. Ahmed, John Lach, Donna Metz, Kayla De La Haye, Brooke Bell, and John A. Stankovic. 2017. Distant emotion recognition. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 1, 3 (2017), 96.

Digital Library

[48]

Eugene Yu Shchetinin, Leonid A. Sevastianov, Dmitry S. Kulyabov, Edik A. Ayrjan, and Anastasia V. Demidova. 2020. Deep neural networks for emotion recognition. In International Conference on Distributed Computer and Communication Networks. Springer, 365–379.

[49]

Melissa N. Stolar, Margaret Lech, Robert S. Bolia, and Michael Skinner. 2017. Real time speech emotion recognition using RGB image classification and transfer learning. In 11th International Conference on Signal Processing and Communication Systems (ICSPCS’17). IEEE, 1–8.

[50]

Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, and Björn Schuller. 2019. Towards robust speech emotion recognition using deep residual networks for speech enhancement. In Interspeech Conference.

[51]

George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 5200–5204.

[52]

N. Vrebčević, I. Mijić, and D. Petrinović. 2019. Emotion classification based on convolutional neural network using speech data. In 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’19). IEEE, 1007–1012.

[53]

Kunxia Wang, Ning An, Bing Nan Li, Yanyong Zhang, and Lian Li. 2015. Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6, 1 (2015), 69–75.

Digital Library

[54]

Lahiru Wijayasingha and John A. Stankovic. 2021. Robustness to noise for speech emotion classification using CNNs and attention mechanisms. Smart Health 19 (2021), 100165.

[55]

Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, and Isra Zaman. 2019. Emotion detection from speech signals using voting mechanism on classified frames. In International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST’19). IEEE, 281–285.

[56]

Mingmin Zhao, Fadel Adib, and Dina Katabi. 2016. Emotion recognition using wireless signals. In 22nd Annual International Conference on Mobile Computing and Networking. 95–108.

Digital Library

[57]

Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In Interspeech, Vol. 5. 1517–1520.

Index Terms

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Machine learning

Recommendations

Emotion modeling from speech signal based on wavelet packet transform

The recognition of emotion in human speech has gained increasing attention in recent years due to the wide variety of applications that benefit from such technology. Detecting emotion from speech can be viewed as a classification task. It consists of ...
Real-Life Emotion Recognition in Speech
Speaker Classification II

This article is dedicated to Real-life emotion detection using a corpus of real agent-client spoken dialogs from a medical emergency call center. Emotion annotations have been done by two experts with twenty verbal classes organized in eight macro-...
Modified dense convolutional networks based emotion detection from speech using its paralinguistic features
Abstract
Emotion recognition through speech is one of the fundamental approaches for human interaction. Speech modulations stipulate different emotions and context. In this paper, we propose modified dense convolutional networks (modified DenseNet201) for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computing for Healthcare

ACM Transactions on Computing for Healthcare Volume 3, Issue 2

April 2022

292 pages

EISSN:2637-8051

DOI:10.1145/3505188

Editors:
Insup Lee
University of Pennsylvania, USA
,
John A. Stankovic
University of Virginia, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2021

Accepted: 01 July 2021

Revised: 01 July 2021

Received: 01 June 2020

Published in HEALTH Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

NSF Smart and Connected Health

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
427
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)10

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents