Article

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Authors:

Kamil Bentounes,

Abdenour HadidAuthors Info & Claims

Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part II

Pages 5 - 19

https://doi.org/10.1007/978-3-030-68790-8_1

Published: 10 January 2021 Publication History

Abstract

Intelligent monitoring systems and affective computing applications have emerged in recent years to enhance healthcare. Examples of these applications include assessment of affective states such as Major Depressive Disorder (MDD). MDD describes the constant expression of certain emotions: negative emotions (low Valence) and lack of interest (low Arousal). High-performing intelligent systems would enhance MDD diagnosis in its early stages. In this paper, we present a new deep neural network architecture, called EmoAudioNet, for emotion and depression recognition from speech. Deep EmoAudioNet learns from the time-frequency representation of the audio signal and the visual representation of its spectrum of frequencies. Our model shows very promising results in predicting affect and depression. It works similarly or outperforms the state-of-the-art methods according to several evaluation metrics on RECOLA and on DAIC-WOZ datasets in predicting arousal, valence, and depression. Code of EmoAudioNet is publicly available on GitHub: https://github.com/AliceOTHMANI/EmoAudioNet.

References

[1]

GBD 2015 Disease and Injury Incidence and Prevalence Collaborators: Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015, Lancet, vol. 388, no. 10053, pp. 1545–1602 (2015)

[2]

The National Institute of Mental Health: Depression. https://www.nimh.nih.gov/health/topics/depression/index.shtml. Accessed 17 June 2019

[3]

Valstar, M., et al.: AVEC 2016 - depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshop on Audio/visual Emotion Challenge, pp. 3–10. ACM (2016)

[4]

Ringeval, F., et al.: AVEC 2017 - real-life depression, and affect recognition workshop and challenge. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 3–9. ACM (2017)

[5]

Jiang, H., Hu, B., Liu, Z., Wang, G., Zhang, L., Li, X., Kang, H.: Detecting depression using an ensemble logistic regression model based on multiple speech features. Comput. Math. Methods Medicine 2018 (2018)

[6]

Alghowinem, S., et al.: A comparative study of different classifiers for detecting depression from spontaneous speech. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8022–8026 (2013)

[7]

Valstar, M., et al.: AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pp. 3–10 (2013)

[8]

Yang, L., Sahli, H., Xia, X., Pei, E., Oveneke, M.C., Jiang, D.: Hybrid depression classification and estimation from audio video and text information. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 45–51. ACM (2017)

[9]

Cummins, N., Epps, J., Breakspear M., Goecke, R.: An investigation of depressed speech detection: features and normalization. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

[10]

Lopez-Otero, P., Dacia-Fernandez, L., Garcia-Mateo, C.: A study of acoustic features for depression detection. In: 2nd International Workshop on Biometrics and Forensics, pp. 1–6. IEEE (2014)

[11]

Ringeval, F., et al.: Av+EC 2015 - the first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3–8. ACM (2015)

[12]

He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 73–80. ACM (2015)

[13]

Ringeval, F., et al.: AVEC 2018 workshop and challenge: bipolar disorder and cross-cultural affect recognition. In: Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pp. 3–13. ACM (2018)

[14]

Dhall, A., Ramana Murthy, O.V., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 423–426 (2015)

[15]

Haq, S., Jackson, P.J., Edge, J.: Speaker-dependent audio-visual emotion recognition. In: AVSP, pp. 53–58 (2009)

[16]

Low LSA, Maddage NC, Lech M, Sheeber LB, and Allen NB Detection of clinical depression in adolescents’ speech during family interactions IEEE Trans. Biomed. Eng. 2010 58 3 574-586

[17]

Valstar, M., Schuller, B.W., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014: the 4th international audio/visual emotion challenge and workshop. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1243–1244 (2014)

[18]

Meng, H., Huang, D., Wang, H., Yang, H., Ai-Shuraifi, M., Wang, Y.: Depression recognition based on dynamic facial and vocal expression features using partial least square regression. In: Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pp. 21–30 (2013)

[19]

Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204 (2016)

[20]

Ringeval F et al. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data Pattern Recogn. Lett. 2015 66 22-30

[21]

Ringeval, F., Schuller, B., Valstar, M., Cowie, R., Pantic, M.: AVEC 2015: the 5th international audio/visual emotion challenge and workshop. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1335–1336 (2015)

[22]

Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, and Zafeiriou S End-to-end multimodal emotion recognition using deep neural networks IEEE J. Sel. Topics Signal Process. 2017 11 8 1301-1309

[23]

Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)

[24]

Dham, S., Sharma, A., Dhall, A.: Depression scale recognition from audio, visual and text analysis. arXiv preprint arXiv:1709.05865

[25]

Salekin A, Eberle JW, Glenn JJ, Teachman BA, and Stankovic JA A weakly supervised learning framework for detecting social anxiety and depression Proc. ACM Interact. Mobile Wearable Ubiquit. Technol. 2018 2 2 81

[26]

Yang, L., Jiang, D., Xia, X., Pei, E., Oveneke, M.C., Sahli, H.: Multimodal measurement of depression using deep learning models. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 53–59 (2017)

[27]

Jain, R.: Improving performance and inference on audio classification tasks using capsule networks. arXiv preprint arXiv:1902.05069 (2019)

[28]

Chao, L., Tao, J., Yang, M., Li, Y.: Multi task sequence learning for depression scale prediction from video. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 526–531. IEEE (2015)

[29]

Gupta, R., Sahu, S., Espy-Wilson, C.Y., Narayanan, S.S.: An affect prediction approach through depression severity parameter incorporation in neural networks. In: Interspeech, pp. 3122–3126 (2017)

[30]

Kang Y, Jiang X, Yin Y, Shang Y, Zhou X, et al. Zhou J et al. Deep transformation learning for depression diagnosis from facial images Biometric Recognition 2017 Cham Springer 13-22

[31]

Yu, G., Slotine, J.J.: Audio classification from time-frequency texture. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1677–1680 (2009)

[32]

Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)

[33]

Gratch, J., et al.: The distress analysis interview corpus of human and computer interviews. LREC, pp. 3123–3128 (2014)

[34]

Ma, X., Yang, H., Chen, Q., Huang, D., Wang, Y.: Depaudionet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 35–42 (2016)

[35]

Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., Othmani, A.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. arXiv preprint arXiv:1909.07208 (2019)

[36]

Tzirakis, P., Zhang, J., Schuller, B.W.: End-to-end speech emotion recognition using deep neural networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089–5093 (2018)

Cited By

Zhu MWang XWang XChen ZHuang W(2024)Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online TaskProceedings of the ACM on Human-Computer Interaction10.1145/36869818:CSCW2(1-23)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686981
Chen WXing XXu XPang JDu L(2023)SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech ProcessingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.323519431(775-788)Online publication date: 9-Jan-2023
https://dl.acm.org/doi/10.1109/TASLP.2023.3235194
Othmani AMuzammel M(2023)An Ambient Intelligence-Based Approach for Longitudinal Monitoring of Verbal and Vocal Depression SymptomsPredictive Intelligence in Medicine10.1007/978-3-031-46005-0_18(206-217)Online publication date: 8-Oct-2023
https://dl.acm.org/doi/10.1007/978-3-031-46005-0_18
Show More Cited By

Recommendations

Inferring Depression and Affect from Application Dependent Meta Knowledge
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather ...
The Chinese speech emotion interaction and recognition on internet
IMSA '07: Proceedings of the Eleventh IASTED International Conference on Internet and Multimedia Systems and Applications

In this paper, we attempt to establish a communication environment of emotions and select six emotions---joy, surprise, anger, sadness, fear and neutral. With the speech recognition, the researcher analyzes the changes and reactions of emotions in ...
Speech emotion recognition system using gender dependent convolution neural network
Abstract
Determining the emotion of a speaker using their speech utterance by a machine is referred to as Speech Emotion Recognition System. It can greatly enhance the human and machine interaction experience. However, the system faces poor performance due ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part II

Jan 2021

766 pages

ISBN:978-3-030-68789-2

DOI:10.1007/978-3-030-68790-8

Editors:
Alberto Del Bimbo
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
,
Rita Cucchiara
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
,
Stan Sclaroff
Department of Computer Science, Boston University, Boston, MA, USA
,
Giovanni Maria Farinella
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
,
Tao Mei
Cloud & AI, JD.COM, Beijing, China
,
Marco Bertini
Dipartimento di Ingegneria dell’Informazione, Universita di Firenze, Firenze, Italy
,
Hugo Jair Escalante
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
,
Roberto Vezzani
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy

© Springer Nature Switzerland AG 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 January 2021

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu MWang XWang XChen ZHuang W(2024)Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online TaskProceedings of the ACM on Human-Computer Interaction10.1145/36869818:CSCW2(1-23)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686981
Chen WXing XXu XPang JDu L(2023)SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech ProcessingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.323519431(775-788)Online publication date: 9-Jan-2023
https://dl.acm.org/doi/10.1109/TASLP.2023.3235194
Othmani AMuzammel M(2023)An Ambient Intelligence-Based Approach for Longitudinal Monitoring of Verbal and Vocal Depression SymptomsPredictive Intelligence in Medicine10.1007/978-3-031-46005-0_18(206-217)Online publication date: 8-Oct-2023
https://dl.acm.org/doi/10.1007/978-3-031-46005-0_18
Othmani AZeghina AMuzammel M(2022)A Model of Normality Inspired Deep Learning Framework for Depression Relapse Prediction Using Audiovisual DataComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2022.107132226:COnline publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1016/j.cmpb.2022.107132
Zavorina EMakarov I(2021)Depression Detection by Person’s VoiceAnalysis of Images, Social Networks and Texts10.1007/978-3-031-16500-9_21(250-262)Online publication date: 16-Dec-2021
https://dl.acm.org/doi/10.1007/978-3-031-16500-9_21

View Options

View options

Media

Figures

Other

Tables

View Table of Contents