Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

Published: 20 December 2021 Publication History

Abstract

The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.

References

[2]
Starlet Ben Alex, Ben P. Babu, and Leena Mary. 2018. Utterance and syllable level prosodic features for automatic emotion recognition. In IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 31–35.
[3]
Halis Altun and Gökhan Polat. 2007. New frameworks to boost feature selection algorithms in emotion detection for improved human-computer interaction. In International Symposium on Brain, Vision, and Artificial Intelligence. Springer, 533–541.
[4]
Kiavash Bahreini, Rob Nadolski, and Wim Westera. 2016. Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21, 5 (2016), 1367–1386.
[5]
Pablo Barros and Stefan Wermter. 2016. Developing crossmodal expression recognition based on a deep neural model. Adapt.Behav. 24, 5 (2016), 373–396.
[6]
Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, and Ondrej Miksik. 2018. Multi-modal sequence fusion via recursive attention for emotion recognition. In 22nd Conference on Computational Natural Language Learning. 251–259.
[7]
Paul N. Bennett and Nam Nguyen. 2009. Refined experts: Improving classification in large taxonomies. In 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 11–18.
[8]
Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In 9th European Conference on Speech Communication and Technology.
[9]
Ru Ying Cai, Amanda L. Richdale, Cheryl Dissanayake, and Mirko Uljarević. 2018. Brief report: Inter-relationship between emotion regulation, intolerance of uncertainty, anxiety, and depression in youth with autism spectrum disorder. J. Autism Devel. Disord. 48, 1 (2018), 316–325.
[10]
Houwei Cao, David G. Cooper, Michael K. Keutmann, Ruben C. Gur, Ani Nenkova, and Ragini Verma. 2014. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5, 4 (2014), 377–390.
[11]
José Carlos Castillo, Álvaro Castro-González, Fernándo Alonso-Martín, Antonio Fernández-Caballero, and Miguel Ángel Salichs. 2018. Emotion detection and regulation from personal assistant robot in smart environment. In Personal Assistants: Emerging Computational Technologies. Springer, 179–195.
[12]
Ling Cen, Fei Wu, Zhu Liang Yu, and Fengye Hu. 2016. A real-time speech emotion recognition system and its application in online learning. In Emotions, Technology, Design, and Learning. Elsevier, 27–46.
[13]
Rajdeep Chatterjee, Saptarshi Mazumdar, R. Simon Sherratt, Rohit Halder, Tanmoy Maitra, and Debasis Giri. 2021. Real-time speech emotion analysis for smart home assistants.IEEE Trans. Consum. Electron. 67, 1 (2021), 68–76.
[14]
Zeya Chen, Mohsin Y. Ahmed, Asif Salekin, and John A. Stankovic. 2019. ARASID: Artificial reverberation-adjusted indoor speaker identification dealing with variable distances. In International Conference on Embedded Wireless Systems and Networks (EWSN’19). Junction Publishing, 154–165. Retrieved from http://dl.acm.org/citation.cfm?id=3324320.3324339.
[15]
Ming Cheng, Andrew Friesen, and Olalekan Adekola. 2019. Using emotion regulation to cope with challenges: A study of Chinese students in the United Kingdom. Cambr. J. Educ. 49, 2 (2019), 133–145.
[16]
Akash Roy Choudhury, Anik Ghosh, Rahul Pandey, and Subhas Barman. 2018. Emotion recognition from speech signals using excitation source and spectral features. In IEEE Applied Signal Processing Conference (ASPCON’18). IEEE, 257–261.
[17]
Taner Danisman and Adil Alpkocak. 2008. Emotion classification of audio signals using ensemble of support vector machines. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems. Springer, 205–216.
[18]
Dragos Datcu and Léon J. M. Rothkrantz. 2005. Facial expression recognition with relevance vector machines. In IEEE International Conference on Multimedia and Expo. IEEE, 193–196.
[19]
Jun Deng, Xinzhou Xu, Zixing Zhang, Sascha Frühholz, and Björn Schuller. 2017. Universum autoencoder-based domain adaptation for speech emotion recognition. IEEE Sig. Proc. Lett. 24, 4 (2017), 500–504.
[20]
Robert F. Dickerson, Enamul Hoque, Philip Asare, Shahriar Nirjon, and John A. Stankovic. 2014. Resonate: Reverberation environment simulation for improved classification of speech models. In 13th International Symposium on Information Processing in Sensor Networks. IEEE, 107–117.
[21]
Kate Dupuis and M. Kathleen Pichora-Fuller. 2010. Toronto Emotional Speech Set (TESS). University of Toronto, Psychology Department.
[22]
Florian Eyben, Klaus R. Scherer, Björn W. Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7, 2 (2015), 190–202.
[23]
Haytham M. Fayek, Margaret Lech, and Lawrence Cavedon. 2015. Towards real-time speech emotion recognition using deep neural networks. In 9th International Conference on Signal Processing and Communication Systems (ICSPCS’15). IEEE, 1–5.
[24]
V. Fernandes, L. Mascarehnas, C. Mendonca, A. Johnson, and R. Mishra. 2018. Speech emotion recognition using mel frequency cepstral coefficient and SVM classifier. In International Conference on System Modeling & Advancement in Research Trends (SMART’18). IEEE, 200–204.
[25]
Antonio Fernández-Caballero, Arturo Martínez-Rodrigo, José Manuel Pastor, José Carlos Castillo, Elena Lozano-Monasor, María T. López, Roberto Zangróniz, José Miguel Latorre, and Alicia Fernández-Sotos. 2016. Smart environment architecture for emotion detection and regulation. J. Biomed. Inf. 64 (2016), 55–73.
[26]
Ye Gao, Meiyi Ma, Kristina Gordon, Karen Rose, Hongning Wang, and John Stankovic. 2020. A monitoring, modeling, and interactive recommendation system for in-home caregivers: Demo abstract. In 18th Conference on Embedded Networked Sensor Systems. 587–588.
[27]
Joseph Gaugler, Bryan James, Tricia Johnson, Allison Marin, and Jennifer Weuve. 2019. 2019 Alzheimer’s disease facts and figures. Alzh. Dement. 15, 3 (2019), 321–387.
[28]
Esam Ghaleb, Mirela Popa, and Stylianos Asteriadis. 2019. Multimodal and temporal perception of audio-visual cues for emotion recognition. In 8th International Conference on Affective Computing & Intelligent Interaction (ACII’19).
[29]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.
[30]
Jacek Grekow. 2018. Music emotion maps in the arousal-valence space. In From Content-based Music Emotion Recognition to Emotion Maps of Musical Pieces. Springer, 95–106.
[31]
James J. Gross and Ricardo F. Muñoz. 1995. Emotion regulation and mental health. Clin. Psychol.: Sci. Pract. 2, 2 (1995), 151–164.
[32]
James J. Gross, Helen Uusberg, and Andero Uusberg. 2019. Mental illness and well-being: An affect regulation perspective. World Psychiat. 18, 2 (2019), 130–139.
[33]
S. Haq and P. J. B. Jackson. 2010. Machine Audition: Principles, Algorithms and Systems. IGI Global, Hershey PA, 398–423.
[34]
Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).
[35]
Che-Wei Huang and Shrikanth Narayanan. 2018. Stochastic shake-shake regularization for affective learning from speech. In Interspeech Conference. 3658–3662.
[36]
Amin Jalili, Sadid Sahami, Chong-Yung Chi, and Rassoul Amirfattahi. 2018. Speech emotion recognition using cyclostationary spectral analysis. In IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP’18). IEEE, 1–6.
[37]
Margaret Lech, Melissa Stolar, Christopher Best, and Robert Bolia. 2020. Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding. Front. Comput. Sci. 2 (2020), 14.
[38]
Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In International Conference on Advances in Neural Information Processing Systems. 7167–7177.
[39]
Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. 2017. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).
[40]
Steven R. Livingstone and Frank A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13, 5 (2018), e0196391.
[41]
Marko Lugger and Bin Yang. 2007. An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th International Congress of Phonetic Sciences.
[42]
Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. National Institute of Science of India.
[43]
Leandro Y. Mano. 2018. Emotional condition in the Health Smart Homes environment: Emotion recognition using ensemble of classifiers. In Innovations in Intelligent Systems and Applications (INISTA’18). IEEE, 1–8.
[44]
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference (EUSIPCO’16). IEEE, 1128–1132.
[45]
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).
[46]
Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, and Gholamreza Anbarjafari. 2017. Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10, 1 (2017), 60–75.
[47]
Asif Salekin, Zeya Chen, Mohsin Y. Ahmed, John Lach, Donna Metz, Kayla De La Haye, Brooke Bell, and John A. Stankovic. 2017. Distant emotion recognition. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 1, 3 (2017), 96.
[48]
Eugene Yu Shchetinin, Leonid A. Sevastianov, Dmitry S. Kulyabov, Edik A. Ayrjan, and Anastasia V. Demidova. 2020. Deep neural networks for emotion recognition. In International Conference on Distributed Computer and Communication Networks. Springer, 365–379.
[49]
Melissa N. Stolar, Margaret Lech, Robert S. Bolia, and Michael Skinner. 2017. Real time speech emotion recognition using RGB image classification and transfer learning. In 11th International Conference on Signal Processing and Communication Systems (ICSPCS’17). IEEE, 1–8.
[50]
Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, and Björn Schuller. 2019. Towards robust speech emotion recognition using deep residual networks for speech enhancement. In Interspeech Conference.
[51]
George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 5200–5204.
[52]
N. Vrebčević, I. Mijić, and D. Petrinović. 2019. Emotion classification based on convolutional neural network using speech data. In 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’19). IEEE, 1007–1012.
[53]
Kunxia Wang, Ning An, Bing Nan Li, Yanyong Zhang, and Lian Li. 2015. Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6, 1 (2015), 69–75.
[54]
Lahiru Wijayasingha and John A. Stankovic. 2021. Robustness to noise for speech emotion classification using CNNs and attention mechanisms. Smart Health 19 (2021), 100165.
[55]
Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, and Isra Zaman. 2019. Emotion detection from speech signals using voting mechanism on classified frames. In International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST’19). IEEE, 281–285.
[56]
Mingmin Zhao, Fadel Adib, and Dina Katabi. 2016. Emotion recognition using wireless signals. In 22nd Annual International Conference on Mobile Computing and Networking. 95–108.
[57]
Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In Interspeech, Vol. 5. 1517–1520.

Index Terms

  1. Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Computing for Healthcare
      ACM Transactions on Computing for Healthcare  Volume 3, Issue 2
      April 2022
      292 pages
      EISSN:2637-8051
      DOI:10.1145/3505188
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 December 2021
      Accepted: 01 July 2021
      Revised: 01 July 2021
      Received: 01 June 2020
      Published in HEALTH Volume 3, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Synthetic datasets
      2. convolutional neural networks
      3. out-of-distribution detection
      4. emotion detection
      5. noise
      6. distance
      7. reverberation

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • NSF Smart and Connected Health

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 427
        Total Downloads
      • Downloads (Last 12 months)78
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 22 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media