Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Paying attention to uncertainty: : A stochastic multimodal transformers for post-traumatic stress disorder detection using video

Published: 01 December 2024 Publication History

Abstract

Background and Objectives:

Post-traumatic stress disorder is a debilitating psychological condition that can manifest following exposure to traumatic events. It affects individuals from diverse backgrounds and is associated with various symptoms, including intrusive thoughts, nightmares, hyperarousal, and avoidance behaviors.

Methods:

To address this challenge this study proposes a decision support system powered by a novel multimodal deep learning approach, based on a stochastic Transformer and video data. This Transformer has the ability to take advantage of its stochastic activation function and layers that allow it to learn sparse representations of the inputs. The method leverages a combination of low-level features extracted using three modalities, including Mel-frequency cepstral coefficients extracted from audio recordings, Facial Action Units captured from facial expressions, and textual data obtained from the audio transcription. By considering these modalities, our proposed model captures a comprehensive range of information related to post-traumatic stress disorder symptoms, including vocal cues, facial expressions, and linguistic content.

Results:

The deep learning model was trained and evaluated on the eDAIC dataset, which consists of clinical interviews with individuals with and without post-traumatic disorder. The model achieved state-of-the-art results, demonstrating its effectiveness in accurately detecting PTSD, showing an impressive Root Mean Square Error of 1.98, and a Concordance Correlation Coefficient of 0.722, signifying the model’s superior performance compared to existing approaches.

Conclusion:

This work introduces a new method for post-traumatic stress disorder detection from videos by utilizing a multimodal stochastic Transformer model. The model makes use of a variety of modalities, such as text, audio, and visual data, to gather comprehensive and varied information in order to make the detection.

Highlights

A multimodal approach for PTSD detection using audio, facial features, and text transcription.
4 Transformers that uses stochastic components for video-based PTSD detection.
A fusion model that combines features fusion and decision fusion.
An equilibrium-based approach for visual features extraction on AUs for PTSD patterns recognition.

References

[1]
Yehuda R., LeDoux J., Response variation following trauma: A translational neuroscience approach to understanding PTSD, Neuron 56 (1) (2007) 19–32.
[2]
Taylor-Desir M., What is Posttraumatic Stress Disorder (PTSD)?, American Psychiatry Association, 2022.
[3]
Krothapalli S.R., Koolagudi S.G., Characterization and recognition of emotions from speech using excitation source information, Int. J. Speech Technol. 16 (2) (2013) 181–201.
[4]
Meehl P.E., Why summaries of research on psychological theories are often uninterpretable, Psychol. Rep. 66 (1) (1990) 195–244.
[5]
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 521 (7553) (2015) 436–444.
[6]
Lang P.J., A bio-informational theory of emotional imagery, Psychophysiology 16 (6) (1979) 495–512.
[7]
Valiant L.G., A theory of the learnable, Commun. ACM 27 (11) (1984) 1134–1142.
[8]
Westra H.A., Review of pathological anxiety: Emotional processing in etiology and treatment, 2007.
[9]
Shmueli G., To explain or to predict?, Stat. Sci. 25 (3) (2010) 289–310.
[10]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
[11]
Luong M.-T., Pham H., Manning C.D., Effective approaches to attention-based neural machine translation, 2015, arXiv preprint arXiv:1508.04025.
[12]
Fang H., Lee J.-U., Moosavi N.S., Gurevych I., Transformers with learnable activation functions, 2022, arXiv preprint arXiv:2208.14111.
[13]
B. N., The epidemiology of trauma, PTSD, and other posttrauma disorders, Trauma Violence Abuse 10 (3) (2009) 198–210. arXiv:https://doi.org/10.1177/1524838009334448, PMID: 19406860.
[14]
Buckley T.C., Blanchard E.B., Neill W., Information processing and ptsd: A review of the empirical literature, Clin. Psychol. Rev. 20 (8) (2000) 1041–1065.
[15]
Kim E.-H., Coumar A., Lober W.B., Kim Y., Addressing mental health epidemic among university students via web-based, self-screening, and referral system: a preliminary study, IEEE Trans. Inf. Technol. Biomed. 15 (2) (2011) 301–307.
[16]
Banerjee D., Islam K., Xue K., Mei G., Xiao L., Zhang G., Xu R., Lei C., Ji S., Li J., A deep transfer learning approach for improved post-traumatic stress disorder diagnosis, Knowl. Inf. Syst. 60 (3) (2019) 1693–1724.
[17]
Sawadogo M.A.L., Pala F., Singh G., Selmi I., Puteaux P., Othmani A., PTSD in the wild: A video database for studying post-traumatic stress disorder recognition in unconstrained environments, 2022, arXiv:2209.14085.
[18]
Othmani A., Brahem B., Haddou Y., Khan M., Machine Learning-based Approaches for Post-Traumatic Stress Disorder Diagnosis using Video and EEG Sensors: A Review, 2023,. URL https://www.techrxiv.org/articles/preprint/Machine_Learning-based_Approaches_for_Post-Traumatic_Stress_Disorder_Diagnosis_using_Video_and_EEG_Sensors_A_Review/21967115.
[19]
Josephine M. J. M., Sudha G.F., Nakkeeran R., An atypical approach toward PTSD diagnosis through speech-based emotion modeling using CNN-LSTM, in: Proceedings of Trends in Electronics and Health Informatics, Springer, 2022, pp. 291–309.
[20]
Zhang P., Wu M., Dinkel H., Yu K., DEPA: Self-supervised audio embedding for depression detection, 2021, pp. 135–143.
[21]
Zhang Y., Hu W., Wu Q., Autoencoder Based on Cepstrum Separation to Detect Depression from Speech, in: ICITEE2020, Association for Computing Machinery, New York, NY, USA, 2021, pp. 508–510.
[22]
Schultebraucks K., Yadav V., Shalev A.Y., Bonanno G.A., Galatzer-Levy I.R., Deep learning-based classification of posttraumatic stress disorder and depression following trauma utilizing visual and auditory markers of arousal and mood, Psychol. Med. 52 (5) (2022) 957–967.
[23]
Sawalha J., Yousefnezhad M., Shah Z., Brown M.R., Greenshaw A.J., Greiner R., Detecting presence of PTSD using sentiment analysis from text data, Front. Psychiatry 12 (2022).
[24]
Gratch J., Artstein R., et al., The Distress Analysis Interview Corpus of Human and Computer Interviews, University of Southern California Los Angeles, 2014.
[25]
H. Kaya, D. Fedotov, et al., Predicting depression and emotions in the cross-roads of cultures, para-linguistics, and non-linguistics, in: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 27–35.
[26]
McDonald A.D., Sasangohar F., Jatav A., Rao A.H., Continuous monitoring and detection of post-traumatic stress disorder (PTSD) triggers among veterans: a supervised machine learning approach, IISE Trans. Healthc. Syst. Eng. 9 (3) (2019) 201–211.
[27]
Sheynin S., Wolf L., Ben-Zion Z., Sheynin J., Reznik S., Keynan J.N., Admon R., Shalev A., Hendler T., Liberzon I., Deep learning model of fMRI connectivity predicts PTSD symptom trajectories in recent trauma survivors, Neuroimage 238 (2021).
[28]
M. Rodrigues Makiuchi, T. Warnita, K. Uto, K. Shinoda, Multimodal fusion of bert-cnn and gated cnn representations for depression detection, in: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 55–63.
[29]
He Q., Veldkamp B.P., Glas C.A., de Vries T., Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining, Assessment 24 (2) (2017) 157–172.
[30]
Hendrycks D., Gimpel K., Gaussian error linear units (gelus), 2016, arXiv preprint arXiv:1606.08415.
[31]
Srivastava R., Masci J., Kazerounian S., Gomez F., Schmidhuber J., Compete to compute, Adv. Neural Inf. Process. Syst. (2013).
[32]
Makhzani A., Frey B.J., Winner-take-all autoencoders, Adv. Neural Inf. Process. Syst. 28 (2015).
[33]
Ferré P., Mamalet F., Thorpe S.J., Unsupervised feature learning with winner-takes-all based stdp, Front. Comput. Neurosci. 12 (2018) 24.
[34]
Huang G., Sun Y., Liu Z., Sedra D., Weinberger K.Q., Deep networks with stochastic depth, in: European Conference on Computer Vision, Springer, 2016, pp. 646–661.
[35]
Zagoruyko S., Komodakis N., Wide residual networks, 2016, arXiv preprint arXiv:1605.07146.
[36]
Zhang Q., Yang Y.-B., ResT: An efficient transformer for visual recognition, Adv. Neural Inf. Process. Syst. 34 (2021) 15475–15485.
[37]
Chen Y.-h., Moreno I.L., Sainath T., Visontai M., Alvarez R., Parada C., Locally-connected and convolutional neural networks for small footprint speaker recognition, 2015.
[38]
Baldi P., Sadowski P.J., Understanding dropout, Adv. Neural Inf. Process. Syst. 26 (2013).
[39]
Park J., Woo S., Lee J.-Y., Kweon I.S., Bam: Bottleneck attention module, 2018, arXiv preprint arXiv:1807.06514.
[40]
Vergyri D., Knoth B., Shriberg E., Mitra V., McLaren M., Ferrer L., Garcia P., Marmar C., Speech-based assessment of PTSD in a military population using diverse feature classes, in: Sixteenth Annual Conference of the International Speech Communication Association, Citeseer, 2015.
[41]
Rejaibi E., Komaty A., Meriaudeau F., Agrebi S., Othmani A., MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech, Biomed. Signal Process. Control 71 (2022).
[42]
Kishore K.K., Satish P.K., Emotion recognition in speech using MFCC and wavelet features, in: 2013 3rd IEEE International Advance Computing Conference, IACC, IEEE, 2013, pp. 842–847.
[43]
Jiang L., Wang L., Zhou K., Deep learning stochastic processes with QCD phase transition, Phys. Rev. D 103 (11) (2021).
[44]
Zhi R., Liu M., Zhang D., A comprehensive survey on automatic facial action unit analysis, Vis. Comput. 36 (2020) 1067–1093.
[45]
Correa J.R., Stier-Moses N.E., Wardrop equilibria, in: Encyclopedia of Operations Research and Management Science, Wiley, 2011.
[46]
Joulin A., Tang K., Fei-Fei L., Efficient image and video co-localization with frank-wolfe algorithm, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, Springer, 2014, pp. 253–268.
[47]
Savchenko A.V., Frame-level prediction of facial expressions, valence, arousal and action units for mobile devices, 2022, arXiv preprint arXiv:2203.13436.
[48]
F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E.-M. Messner, et al., AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition, in: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 3–12.
[49]
Solaiman I., Brundage M., Clark J., Askell A., Herbert-Voss A., Wu J., Radford A., Krueger G., Kim J.W., Kreps S., McCain M., Newhouse A., Blazakis J., McGuffie K., Wang J., Release strategies and the social impacts of language models, 2019, arXiv:1908.09203.
[50]
Sankaran B., Mi H., Al-Onaizan Y., Ittycheriah A., Temporal attention model for neural machine translation, 2016, arXiv preprint arXiv:1608.02927.
[51]
Westhausen N.L., Meyer B.T., Dual-signal transformation lstm network for real-time noise suppression, 2020, arXiv preprint arXiv:2005.07551.
[52]
Wong E., Stochastic neural networks, Algorithmica 6 (1–6) (1991) 466–478.
[53]
Dia M., Khodabandelou G., Othmani A., A novel stochastic transformer-based approach for post-traumatic stress disorder detection using audio recording of clinical interviews, in: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems, CBMS, 2023, pp. 700–705,.
[54]
S. Chen, D. Wang, Y. Huang, Exploring the complementary features of audio and text notes for video-based learning in mobile settings, in: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–7.
[55]
Polino A., Pascanu R., Alistarh D., Model compression via distillation and quantization, 2018, arXiv preprint arXiv:1802.05668.
[56]
Rieke N., Hancox J., Li W., Milletari F., Roth H.R., Albarqouni S., Bakas S., Galtier M.N., Landman B.A., Maier-Hein K., et al., The future of digital health with federated learning, NPJ Digit. Med. 3 (1) (2020) 1–7.
[57]
Schoneveld L., Othmani A., Towards a general deep feature extractor for facial expression recognition, in: 2021 IEEE International Conference on Image Processing, ICIP, IEEE, 2021, pp. 2339–2342.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Methods and Programs in Biomedicine
Computer Methods and Programs in Biomedicine  Volume 257, Issue C
Dec 2024
836 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 December 2024

Author Tags

  1. Post-traumatic stress disorder
  2. Deep learning
  3. Transformer
  4. Video analysis
  5. Multimodal fusion
  6. Decision support systems

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media