Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2988257.2988266acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Published: 16 October 2016 Publication History

Abstract

Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities (audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features (emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying not-depressed individuals on the development set and 0.52/0.81, respectively for the test set.

References

[1]
S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx, and M. Breakspear. Head Pose and Movement Analysis as an Indicator of Depression. In Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013.
[2]
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5 R). American Psychiatric Publishing, 2013.
[3]
A. Bobick and J. Davis. Real-time Recognition of Activity Using Temporal Templates. In 3rd IEEE Workshop on Applications of Computer Vision, 1996.
[4]
M. M. Bradley and P. J. Lang. Affective Norms for English words (ANEW): Instruction Manual and Affective Ratings. Technical report, Technical report C-1, University of Florida, 1999.
[5]
C. Chatzaki, M. Pediaditis, G. Vavoulas, and M. Tsiknakis. Estimating Normal and Abnormal Activities Using Smartphones. In 13th International Conference on Wearable, Micro and Nano Technologies for Personalised Health, 2016.
[6]
Depression Vocabulary, Depression Word List. https://myvocabulary.com/word-list/depression-vocabulary/.
[7]
H. Ellgring. Non-verbal Communication in Depression. Cambridge University Press, 2007.
[8]
S. Ghosh, M. Chatterjee, and L.-P. Morency. A Multimodal Context-Based Approach for Distress Assessment. In 16th International Conference on Multimodal Interaction, 2014.
[9]
J. J. Gibson. The Perception of the Visual World. Houghton Mifflin, 1950.
[10]
J. Gratch, R. Artstein, G. M. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsella, D. Traum, S. Rizzo, and L.-P. Morency. The Distress Analysis Interview Corpus of human and computer interviews. In Proceedings of Language Resources and Evaluation Conference (LREC), 2014.
[11]
R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. Van Segbroeck, M. Black, A. Potamianos, and S. Narayanan. Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions. In 4th ACM International Workshop on Audio/Visual Emotion Challenge, 2014.
[12]
Itseez. Open Source Computer Vision Library. https://github.com/itseez/opencv, 2015.
[13]
R. Jain and I. Chlamtac. The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations. Communications of the ACM, 28(10):1076--1085, 1985.
[14]
V. Jain, J. L. Crowley, A. K. Dey, and A. Lux. Depression Estimation Using Audiovisual Features and Fisher Vector Encoding. In 4th ACM International Workshop on Audio/Visual Emotion Challenge, 2014.
[15]
Linguistic Inquiry and Word Count (LIWC). http://liwc.wpengine.com/.
[16]
D. Manousos, G. Iatraki, E. Christinaki, M. Pediaditis, F. Chiarugi, M. Tsiknakis, and K. Marias. Contactless Detection of Facial signs Related to Stress: A Preliminary Study. In International Conference on Wireless Mobile Communication and Healthcare, 2014.
[17]
H. Meng, D. Huang, H. Wang, H. Yang, M. AI-Shurai, and Y. Wang. Depression Recognition Based on Dynamic Facial and Vocal Expression Features Using Partial Least Square Regression. In 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013.
[18]
Negative Feeling Words. http://eqi.org/fwneg.htm.
[19]
T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Ratterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971--987, 2002.
[20]
A. Pampouchidou, K. Marias, M. Tsiknakis, P. Simos, F. Yang, G. Lematre, and F. Meriaudeau. Video-Based Depression Detection Using Local Curvelet Binary Patterns in Pairwise Orthogonal Planes. In 38th International Conference of the IEEE Engineering in Medicine and Biology Society, 2016.
[21]
A. Pampouchidou, K. Marias, M. Tsiknakis, P. Simos, F. Yang, and F. Meriaudeau. Designing a Framework for Assisting Depression Severity Assessment from Facial Image Analysis. In IEEE International Conference on Signal and Image Processing Applications, 2015.
[22]
M. Pediaditis, G. Giannakakis, F. Chiarugi, D. Manousos, A. Pampouchidou, E. Christinaki, G. Iatraki, E. Kazantzaki, P. Simos, K. Marias, and M. Tsiknakis. Extraction of Facial Features as Indicators of Stress and Anxiety. In 37th International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3711--3714. IEEE, 2015.
[23]
R. Ptucha and A. Savakis. Towards the Usage of Optical Flow Temporal Features for Facial Expression Classification. In International Symposium on Visual Computing. Springer, 2012.
[24]
B. Schaling. The boost C++ libraries. XML Press, 2014.
[25]
G. Stratou, S. Scherer, J. Gratch, and L.-P. Morency. Automatic Nonverbal Behavior Indicators of Depression and P T SD: Exploring Gender Differences. In Humaine Association Conference on A ective Computing and Intelligent Interaction, 2013.
[26]
Y. Stylianou, V. Hazan, V. Aubanel, E. Godoy, S. Granlund, M. Huckvale, E. Jokinen, M. Koutsogiannaki, P. Mowlaee, M. Nicolao, T. Raitio, A. Sfakianaki, and Y. Tang. P8-Active Speech Modifications. In International Summer Workshop on Multimodal Interfaces, 2012.
[27]
M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. T. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic. AVEC 2016-Depression, Mood, and Emotion Recognition Workshop and Challenge. arXiv preprint arXiv:1605.01600, 2016.
[28]
M. Valstar, M. Pantic, and I. Patras. Motion History for Facial Action Detection in Video. In IEEE International Conference on Systems, Man and Cybernetics, volume 1, pages 635--640. IEEE, 2004.
[29]
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic. AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge. In 4th International Workshop on Audio/Visual Emotion Challenge, 2014.
[30]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. AVEC 2013: The Continuous Audio/Visual Emotion and Depression Recognition Challenge. In 3rd International Workshop on Audio/Visual Emotion Challenge, 2013.
[31]
D. Zhou, J. Luo, V. M. Silenzio, Y. Zhou, J. Hu, G. Currier, and H. A. Kautz. Tackling Mental Health by Integrating Unobtrusive Multimodal Sensing. In 29th AAAI Conference on Arti cial Intelligence, 2015.

Cited By

View all
  • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
  • (2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
  • (2024)Spatial–Temporal Feature Network for Speech-Based Depression RecognitionIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327361416:1(308-318)Online publication date: Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge
October 2016
114 pages
ISBN:9781450345163
DOI:10.1145/2988257
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AVEC 2016
  2. affective computing
  3. depression assessment
  4. image processing
  5. multimodal fusion
  6. pattern recognition
  7. speech processing

Qualifiers

  • Research-article

Funding Sources

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 16, 2016
Amsterdam, The Netherlands

Acceptance Rates

AVEC '16 Paper Acceptance Rate 12 of 14 submissions, 86%;
Overall Acceptance Rate 52 of 98 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)11
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
  • (2024)Detecting Depression With Heterogeneous Graph Neural Network in Clinical Interview TranscriptIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326305611:1(1315-1324)Online publication date: Feb-2024
  • (2024)Spatial–Temporal Feature Network for Speech-Based Depression RecognitionIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2023.327361416:1(308-318)Online publication date: Feb-2024
  • (2024)A Comprehensive Analysis of Speech Depression Recognition SystemsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500078(1509-1518)Online publication date: 15-Mar-2024
  • (2024)Multimodal depression detection using deep learning in the workplace2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT60202.2024.10468966(1-8)Online publication date: 11-Jan-2024
  • (2024)A novel study for depression detecting using audio signals based on graph neural networkBiomedical Signal Processing and Control10.1016/j.bspc.2023.10567588(105675)Online publication date: Feb-2024
  • (2024)A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and VideoPhenomics10.1007/s43657-023-00152-84:3(234-249)Online publication date: 3-May-2024
  • (2024)Reading Between the Frames: Multi-modal Depression Detection in Videos from Non-verbal CuesAdvances in Information Retrieval10.1007/978-3-031-56027-9_12(191-209)Online publication date: 20-Mar-2024
  • (2023)Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task LearningApplied Sciences10.3390/app1317991013:17(9910)Online publication date: 1-Sep-2023
  • (2023)A Depression Recognition Method Based on the Alteration of Video Temporal Angle FeaturesApplied Sciences10.3390/app1316923013:16(9230)Online publication date: 14-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media