Nothing Special   »   [go: up one dir, main page]

skip to main content
rapid-communication

Sequential fusion of facial appearance and dynamics for depression recognition

Published: 01 October 2021 Publication History

Highlights

A sequential fusion approach is proposed for facial depression recognition.
The correlation and complementarity between facial appearance and dynamics are well exploited.
Evaluations on the benchmark show the improvement over several competitive solutions.

Abstract

In mental health assessment, it is validated that nonverbal cues like facial expressions can be indicative of depressive disorders. Recently, the multimodal fusion of facial appearance and dynamics based on convolutional neural networks has demonstrated encouraging performance in depression analysis. However, correlation and complementarity between different visual modalities have not been well studied in prior methods. In this paper, we propose a sequential fusion method for facial depression recognition. For mining the correlated and complementary depression patterns in multimodal learning, a chained-fusion mechanism is introduced to jointly learn facial appearance and dynamics in a unified framework. We show that such sequential fusion can provide a probabilistic perspective of the model correlation and complementarity between two different data modalities for improved depression recognition. Results on a benchmark dataset show the superiority of our method against several state-of-the-art alternatives.

References

[1]
W.H. Organization, et al., Depression and Other Common Mental Disorders: Global Health Estimates, Technical Report, World Health Organization, 2017.
[2]
S. Ross, N. Heath, A study of the frequency of self-mutilation in a community sample of adolescents, J. Youth Adolesc. 31 (1) (2002) 67–77.
[3]
J.-P. Lépine, M. Briley, The increasing burden of depression, Neuropsychiatr. Dis. Treat. 7 (Suppl 1) (2011) 3.
[4]
S. Ji, X. Li, Z. Huang, E. Cambria, Suicidal ideation and mental disorder detection with attentive relation networks, Neural Comput. Appl. (2021).
[5]
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, Avec 2014: 3d dimensional affect and depression recognition challenge, Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014, pp. 3–10.
[6]
J.R. Williamson, T.F. Quatieri, B.S. Helfer, R. Horwitz, B. Yu, D.D. Mehta, Vocal biomarkers of depression based on motor incoordination, Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, 2013, pp. 41–48.
[7]
N. Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, J. Epps, Diagnosis of depression by behavioural signals: a multimodal approach, Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 11–20.
[8]
J.F. Cohn, T.S. Kruez, I. Matthews, Y. Yang, M.H. Nguyen, M.T. Padilla, F. Zhou, F. De la Torre, Detecting depression from facial actions and vocal prosody, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, IEEE, 2009, pp. 1–7.
[9]
H. Meng, D. Huang, H. Wang, H. Yang, M. Ai-Shuraifi, Y. Wang, Depression recognition based on dynamic facial and vocal expression features using partial least square regression, Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 21–30.
[10]
Y. Yang, C. Fairbairn, J.F. Cohn, Detecting depression severity from vocal prosody, IEEE Trans. Affect. Comput. 4 (2) (2012) 142–150.
[11]
X. Ma, H. Yang, Q. Chen, D. Huang, Y. Wang, Depaudionet: an efficient deep model for audio based depression classification, Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, 2016, pp. 35–42.
[12]
J.M. Girard, J.F. Cohn, M.H. Mahoor, S.M. Mavadati, Z. Hammal, D.P. Rosenwald, Nonverbal social withdrawal in depression: evidence from manual and automatic analyses, Image Vis. Comput. 32 (10) (2014) 641–647.
[13]
L. Wen, X. Li, G. Guo, Y. Zhu, Automated depression diagnosis based on facial dynamic analysis and sparse coding, IEEE Trans. Inf. Forensics Secur. 10 (7) (2015) 1432–1441.
[14]
S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Hyett, G. Parker, M. Breakspear, Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors, IEEE Trans. Affect. Comput. 9 (4) (2016) 478–490.
[15]
Y. Zhu, Y. Shang, Z. Shao, G. Guo, Automated depression diagnosis based on deep networks to encode facial appearance and dynamics, IEEE Trans. Affect. Comput. 9 (4) (2017) 578–584.
[16]
A.T. Beck, R.A. Steer, R. Ball, W.F. Ranieri, Comparison of beck depression inventories-ia and-ii in psychiatric outpatients, J. Pers. Assess. 67 (3) (1996) 588–597.
[17]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Ieee, 2009, pp. 248–255.
[18]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012) 1097–1105.
[19]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, Avec 2013: the continuous audio/visual emotion and depression recognition challenge, Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 3–10.
[20]
T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971–987.
[21]
A. Bosch, A. Zisserman, X. Munoz, Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007, pp. 401–408.
[22]
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008, pp. 1–8.
[23]
H. Meng, N. Pears, Descriptive temporal template features for visual motion recognition, Pattern Recognit. Lett. 30 (12) (2009) 1049–1058.
[24]
S. De Jong, Simpls: an alternative approach to partial least squares regression, Chemom. Intell. Lab. Syst. 18 (3) (1993) 251–263.
[25]
T.R. Almaev, M.F. Valstar, Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, IEEE, 2013, pp. 356–361.
[26]
H. Pérez Espinosa, H.J. Escalante, L. Villaseñor-Pineda, M. Montes-y Gómez, D. Pinto-Avedaño, V. Reyez-Meza, Fusing affective dimensions and audio-visual features from segmented video for depression recognition: Inaoe-buap’s participation at avec’14 challenge, Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014, pp. 49–55.
[27]
A. Jan, H. Meng, Y.F.A. Gaus, F. Zhang, S. Turabzadeh, Automatic depression scale prediction using facial expression dynamics and regression, Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014, pp. 73–80.
[28]
H. Kaya, F. Çilli, A.A. Salah, Ensemble CCA for continuous emotion prediction, Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014, pp. 19–26.
[29]
M.A. Uddin, J.B. Joolee, Y.-K. Lee, Depression level prediction using deep spatiotemporal features and multilayer bi-LTSM, IEEE Trans. Affect. Comput. (2020).
[30]
X. Zhou, K. Jin, Y. Shang, G. Guo, Visually interpretable representation learning for depression recognition from facial images, IEEE Trans. Affect. Comput. 11 (3) (2020) 542–552.
[31]
X. Zhou, Z. Wei, M. Xu, S. Qu, G. Guo, Facial depression recognition by deep joint label distribution and metric learning, IEEE Trans. Affect. Comput. (2020),.
[32]
Y. Duan, J. Lu, Z. Wang, J. Feng, J. Zhou, Learning deep binary descriptor with multi-quantization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1183–1192.
[33]
I. Chaturvedi, R. Satapathy, S. Cavallari, E. Cambria, Fuzzy commonsense reasoning for multimodal sentiment analysis, Pattern Recognit. Lett. 125 (264–270) (2019).
[34]
P. Chhokra, A. Chowdhury, G. Goswami, M. Vatsa, R. Singh, Unconstrained kinect video face database, Inf. Fusion 44 (2018) 113–125.
[35]
L. Stappen, A. Baird, E. Cambria, B. Schuller, Sentiment analysis and topic recognition in video transcriptions, IEEE Intell. Syst. 36 (2) (2021) 88–95.
[36]
A. Wang, J. Cai, J. Lu, T.-J. Cham, Mmss: multi-modal sharable and specific feature learning for RGB-D object recognition, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1125–1133.
[37]
E. Cambria, N. Howard, J. Hsu, A. Hussain, Sentic blending: Scalable multimodal fusion for continuous interpretation of semantics and sentics, IEEE SSCI, Singapore, 2013, pp. 108–117.
[38]
A. Wang, J. Lu, J. Cai, T.-J. Cham, G. Wang, Large-margin multi-modal deep learning for RGB-Dobject recognition, IEEE Trans. Multimed. 17 (11) (2015) 1887–1898.
[39]
A. Eitel, J.T. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, Multimodal deep learning for robust RGB-D object recognition, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2015, pp. 681–687.
[40]
I. Lenz, H. Lee, A. Saxena, Deep learning for detecting robotic grasps, Int. J. Robot. Res. 34 (4–5) (2015) 705–724.
[41]
M. Zolfaghari, G.L. Oliveira, N. Sedaghat, T. Brox, Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2904–2913.
[42]
H. Yan, J. Lu, W. Deng, X. Zhou, Discriminative multimetric learning for kinship verification, IEEE Trans. Inf. Forensics Secur. 9 (7) (2014) 1169–1178.
[43]
J. Hu, J. Lu, Y.-P. Tan, Sharable and individual multi-view metric learning, IEEE Trans. Pattern Anal. Mach. Intell. 40 (9) (2017) 2281–2288.
[44]
B. Li, W. Li, Y. Tang, J.-F. Hu, W.-S. Zheng, Gl-pam RGB-D gesture recognition, 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE, 2018, pp. 3109–3113.
[45]
C. Feichtenhofer, A. Pinz, A. Zisserman, Convolutional two-stream network fusion for video action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1933–1941.
[46]
C. Feichtenhofer, H. Fan, J. Malik, K. He, Slowfast networks for video recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6202–6211.
[47]
N. Srivastava, R. Salakhutdinov, et al., Multimodal learning with deep boltzmann machines., NIPS, 1, Citeseer, 2012, p. 2.
[48]
Z. Wang, S. Ho, E. Cambria, A review of emotion sensing: categorization models and algorithms, Multimed. Tools Appl. 79 (2020) 35553–35582.
[49]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[50]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[51]
D. Yi, Z. Lei, S. Liao, S.Z. Li, Learning face representation from scratch, arXiv:1411.7923(2014).
[52]
C. Zach, T. Pock, H. Bischof, A duality based approach for realtime tv-l 1 optical flow, Joint Pattern Recognition Symposium, Springer, 2007, pp. 214–223.
[53]
P. Weinzaepfel, Z. Harchaoui, C. Schmid, Learning to track for spatio-temporal action localization, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3164–3172.
[54]
K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, Adv. Neural Inf. Process. Syst. 27 (2014) 568–576.
[55]
J.T. Olin, L.S. Schneider, E.M. Eaton, M.F. Zemansky, V.E. Pollock, The geriatric depression scale and the beck depression inventory as screening instruments in an older adult outpatient population., Psychol. Assess. 4 (2) (1992) 190.
[56]
D.E. King, Dlib-ml: a machine learning toolkit, J. Mach. Learn. Res. 10 (2009) 1755–1758.
[57]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556(2014).
[58]
S. Maji, A.C. Berg, J. Malik, Classification using intersection kernel support vector machines is efficient, 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008, pp. 1–8.
[59]
L. He, D. Jiang, H. Sahli, Automatic depression analysis using dynamic facial appearance descriptor and dirichlet process fisher encoding, IEEE Trans. Multimed. 21 (6) (2018) 1476–1486.
[60]
W.C. de Melo, E. Granger, A. Hadid, Combining global and local convolutional 3d networks for detecting depression from facial expressions, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), IEEE, 2019, pp. 1–8.
[61]
S. Song, S. Jaiswal, L. Shen, M. Valstar, Spectral representation of behaviour primitives for depression analysis, IEEE Trans. Affect. Comput. (2020).
[62]
L. He, J.C.-W. Chan, Z. Wang, Automatic depression recognition using CNN with attention mechanism from videos, Neurocomputing 422 (2021) 165–175.

Cited By

View all
  • (2024)Loss Relaxation Strategy for Noisy Facial Video-based Automatic Depression RecognitionACM Transactions on Computing for Healthcare10.1145/3648696Online publication date: 4-Mar-2024
  • (2024)A computational model for assisting individuals with suicidal ideation based on context historiesUniversal Access in the Information Society10.1007/s10209-023-00991-223:3(1447-1466)Online publication date: 1-Aug-2024
  • (2024)Explainable AI for Stress and Depression Detection in the Cyberspace and BeyondTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-981-97-2650-9_9(108-120)Online publication date: 7-May-2024
  • Show More Cited By

Index Terms

  1. Sequential fusion of facial appearance and dynamics for depression recognition
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Pattern Recognition Letters
          Pattern Recognition Letters  Volume 150, Issue C
          Oct 2021
          313 pages

          Publisher

          Elsevier Science Inc.

          United States

          Publication History

          Published: 01 October 2021

          Author Tags

          1. Depression recognition
          2. Facial representation
          3. Convolutional neural network
          4. Multimodal learning
          5. Sequential fusion

          Qualifiers

          • Rapid-communication

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 14 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Loss Relaxation Strategy for Noisy Facial Video-based Automatic Depression RecognitionACM Transactions on Computing for Healthcare10.1145/3648696Online publication date: 4-Mar-2024
          • (2024)A computational model for assisting individuals with suicidal ideation based on context historiesUniversal Access in the Information Society10.1007/s10209-023-00991-223:3(1447-1466)Online publication date: 1-Aug-2024
          • (2024)Explainable AI for Stress and Depression Detection in the Cyberspace and BeyondTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-981-97-2650-9_9(108-120)Online publication date: 7-May-2024
          • (2023)TensorFormer: A Tensor-Based Multimodal Transformer for Multimodal Sentiment Analysis and Depression DetectionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.323307014:4(2776-2786)Online publication date: 1-Oct-2023
          • (2023)D-ResNet-PVKELM: deep neural network and paragraph vector based kernel extreme machine learning model for multimodal depression analysisMultimedia Tools and Applications10.1007/s11042-023-14351-y82:17(25973-26004)Online publication date: 11-Jan-2023
          • (2022)Multi-modal Depression Estimation Based on Sub-attentional FusionComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25075-0_42(623-639)Online publication date: 23-Oct-2022

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media