research-article

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Authors:

Anastasia Pampouchidou,

Olympia Simantiraki,

Amir Fazlollahi,

Matthew Pediaditis,

Dimitris Manousos,

Alexandros Roniotis,

Georgios Giannakakis,

Fabrice Meriaudeau,

Panagiotis Simos,

Manolis TsiknakisAuthors Info & Claims

AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

Pages 27 - 34

https://doi.org/10.1145/2988257.2988266

Published: 16 October 2016 Publication History

Abstract

Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.

References

[1]

S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx, and M. Breakspear. Head Pose and Movement Analysis as an Indicator of Depression. In Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013.

Digital Library

[2]

American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5 R). American Psychiatric Publishing, 2013.

[3]

A. Bobick and J. Davis. Real-time Recognition of Activity Using Temporal Templates. In 3rd IEEE Workshop on Applications of Computer Vision, 1996.

Digital Library

[4]

M. M. Bradley and P. J. Lang. Affective Norms for English words (ANEW): Instruction Manual and Affective Ratings. Technical report, Technical report C-1, University of Florida, 1999.

[5]

C. Chatzaki, M. Pediaditis, G. Vavoulas, and M. Tsiknakis. Estimating Normal and Abnormal Activities Using Smartphones. In 13th International Conference on Wearable, Micro and Nano Technologies for Personalised Health, 2016.

[6]

Depression Vocabulary, Depression Word List. https://myvocabulary.com/word-list/depression-vocabulary/.

[7]

H. Ellgring. Non-verbal Communication in Depression. Cambridge University Press, 2007.

[8]

S. Ghosh, M. Chatterjee, and L.-P. Morency. A Multimodal Context-Based Approach for Distress Assessment. In 16th International Conference on Multimodal Interaction, 2014.

Digital Library

[9]

J. J. Gibson. The Perception of the Visual World. Houghton Mifflin, 1950.

[10]

J. Gratch, R. Artstein, G. M. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsella, D. Traum, S. Rizzo, and L.-P. Morency. The Distress Analysis Interview Corpus of human and computer interviews. In Proceedings of Language Resources and Evaluation Conference (LREC), 2014.

[11]

R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. Van Segbroeck, M. Black, A. Potamianos, and S. Narayanan. Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions. In 4th ACM International Workshop on Audio/Visual Emotion Challenge, 2014.

Digital Library

[12]

Itseez. Open Source Computer Vision Library. https://github.com/itseez/opencv, 2015.

[13]

R. Jain and I. Chlamtac. The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations. Communications of the ACM, 28(10):1076--1085, 1985.

Digital Library

[14]

V. Jain, J. L. Crowley, A. K. Dey, and A. Lux. Depression Estimation Using Audiovisual Features and Fisher Vector Encoding. In 4th ACM International Workshop on Audio/Visual Emotion Challenge, 2014.

Digital Library

[15]

Linguistic Inquiry and Word Count (LIWC). http://liwc.wpengine.com/.

[16]

D. Manousos, G. Iatraki, E. Christinaki, M. Pediaditis, F. Chiarugi, M. Tsiknakis, and K. Marias. Contactless Detection of Facial signs Related to Stress: A Preliminary Study. In International Conference on Wireless Mobile Communication and Healthcare, 2014.

[17]

H. Meng, D. Huang, H. Wang, H. Yang, M. AI-Shurai, and Y. Wang. Depression Recognition Based on Dynamic Facial and Vocal Expression Features Using Partial Least Square Regression. In 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013.

Digital Library

[18]

Negative Feeling Words. http://eqi.org/fwneg.htm.

[19]

T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Ratterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971--987, 2002.

Digital Library

[20]

A. Pampouchidou, K. Marias, M. Tsiknakis, P. Simos, F. Yang, G. Lematre, and F. Meriaudeau. Video-Based Depression Detection Using Local Curvelet Binary Patterns in Pairwise Orthogonal Planes. In 38th International Conference of the IEEE Engineering in Medicine and Biology Society, 2016.

[21]

A. Pampouchidou, K. Marias, M. Tsiknakis, P. Simos, F. Yang, and F. Meriaudeau. Designing a Framework for Assisting Depression Severity Assessment from Facial Image Analysis. In IEEE International Conference on Signal and Image Processing Applications, 2015.

[22]

M. Pediaditis, G. Giannakakis, F. Chiarugi, D. Manousos, A. Pampouchidou, E. Christinaki, G. Iatraki, E. Kazantzaki, P. Simos, K. Marias, and M. Tsiknakis. Extraction of Facial Features as Indicators of Stress and Anxiety. In 37th International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3711--3714. IEEE, 2015.

[23]

R. Ptucha and A. Savakis. Towards the Usage of Optical Flow Temporal Features for Facial Expression Classification. In International Symposium on Visual Computing. Springer, 2012.

[24]

B. Schaling. The boost C++ libraries. XML Press, 2014.

[25]

G. Stratou, S. Scherer, J. Gratch, and L.-P. Morency. Automatic Nonverbal Behavior Indicators of Depression and P T SD: Exploring Gender Differences. In Humaine Association Conference on A ective Computing and Intelligent Interaction, 2013.

Digital Library

[26]

Y. Stylianou, V. Hazan, V. Aubanel, E. Godoy, S. Granlund, M. Huckvale, E. Jokinen, M. Koutsogiannaki, P. Mowlaee, M. Nicolao, T. Raitio, A. Sfakianaki, and Y. Tang. P8-Active Speech Modifications. In International Summer Workshop on Multimodal Interfaces, 2012.

[27]

M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. T. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic. AVEC 2016-Depression, Mood, and Emotion Recognition Workshop and Challenge. arXiv preprint arXiv:1605.01600, 2016.

[28]

M. Valstar, M. Pantic, and I. Patras. Motion History for Facial Action Detection in Video. In IEEE International Conference on Systems, Man and Cybernetics, volume 1, pages 635--640. IEEE, 2004.

[29]

M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic. AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge. In 4th International Workshop on Audio/Visual Emotion Challenge, 2014.

Digital Library

[30]

M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. AVEC 2013: The Continuous Audio/Visual Emotion and Depression Recognition Challenge. In 3rd International Workshop on Audio/Visual Emotion Challenge, 2013.

Digital Library

[31]

D. Zhou, J. Luo, V. M. Silenzio, Y. Zhou, J. Hu, G. Currier, and H. A. Kautz. Tackling Mental Health by Integrating Unobtrusive Multimodal Sensing. In 29th AAAI Conference on Arti cial Intelligence, 2015.

Digital Library

Cited By

Khoo LLim MChong CMcNaney R(2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
https://doi.org/10.3390/s24020348
Li MSun XWang M(2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
https://doi.org/10.1109/TCSS.2023.3263056
Han ZShang YShao ZLiu JGuo GLiu TDing HHu Q(2024)Spatial–Temporal Feature Network for Speech-Based Depression RecognitionIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327361416:1(308-318)Online publication date: Feb-2024
https://doi.org/10.1109/TCDS.2023.3273614
Show More Cited By

Index Terms

Recommendations

An Ensemble Method for Classifying Startle Eyeblink Modulation from High-Speed Video Records

Psychophysiological measurements of startle eyeblink can provide information about the state of an individual regarding sensory, attentional, cognitive, and affective processing, and thus reveal valences of interest for affective computing. However, ...
Automated facial video-based recognition of depression and anxiety symptom severity: cross-corpus validation
Abstract
There is a growing interest in computational approaches permitting accurate detection of nonverbal signs of depression and related symptoms (i.e., anxiety and distress) that may serve as minimally intrusive means of monitoring illness progression. ...
Automatic recognition of speech emotion using long-term spectro-temporal features
DSP'09: Proceedings of the 16th international conference on Digital Signal Processing

This paper proposes a novel feature type for the recognition of emotion from speech. The features are derived from a long-term spectro-temporal representation of speech. They are compared to short-term spectral features as well as popular prosodic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

October 2016

114 pages

ISBN:9781450345163

DOI:10.1145/2988257

General Chairs:
Michel Valstar
University of Nottingham, UK
,
Jonathan Gratch
University of Southern California, USA
,
Björn Schuller
University of Pasau/Imperial College London, DE/UK
,
Fabien Ringeval
Université Grenoble Alpes, FR
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme
State Scholarship Foundation - Greece

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 16, 2016

Amsterdam, The Netherlands

Acceptance Rates

AVEC '16 Paper Acceptance Rate 12 of 14 submissions, 86%;

Overall Acceptance Rate 52 of 98 submissions, 53%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
1,261
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)11

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khoo LLim MChong CMcNaney R(2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
https://doi.org/10.3390/s24020348
Li MSun XWang M(2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
https://doi.org/10.1109/TCSS.2023.3263056
Han ZShang YShao ZLiu JGuo GLiu TDing HHu Q(2024)Spatial–Temporal Feature Network for Speech-Based Depression RecognitionIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327361416:1(308-318)Online publication date: Feb-2024
https://doi.org/10.1109/TCDS.2023.3273614
Hassan ABernadin S(2024)A Comprehensive Analysis of Speech Depression Recognition SystemsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500078(1509-1518)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500078
Manjulatha BPabboju S(2024)Multimodal depression detection using deep learning in the workplace2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT60202.2024.10468966(1-8)Online publication date: 11-Jan-2024
https://doi.org/10.1109/ICAECT60202.2024.10468966
Sun CJiang MGao LXin YDong Y(2024)A novel study for depression detecting using audio signals based on graph neural networkBiomedical Signal Processing and Control10.1016/j.bspc.2023.10567588(105675)Online publication date: Feb-2024
https://doi.org/10.1016/j.bspc.2023.105675
Zhang WMao KChen J(2024)A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and VideoPhenomics10.1007/s43657-023-00152-84:3(234-249)Online publication date: 3-May-2024
https://doi.org/10.1007/s43657-023-00152-8
Gimeno-Gómez DBucur ACosma AMartínez-Hinarejos CRosso P(2024)Reading Between the Frames: Multi-modal Depression Detection in Videos from Non-verbal CuesAdvances in Information Retrieval10.1007/978-3-031-56027-9_12(191-209)Online publication date: 20-Mar-2024
https://doi.org/10.1007/978-3-031-56027-9_12
Zhang LLiu CJia N(2023)Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task LearningApplied Sciences10.3390/app1317991013:17(9910)Online publication date: 1-Sep-2023
https://doi.org/10.3390/app13179910
Ding ZHu YJing RSheng WMao J(2023)A Depression Recognition Method Based on the Alteration of Video Temporal Angle FeaturesApplied Sciences10.3390/app1316923013:16(9230)Online publication date: 14-Aug-2023
https://doi.org/10.3390/app13169230
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents