Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Dynamic Confidence-Aware Multi-Modal Emotion Recognition

Published: 08 December 2023 Publication History

Abstract

Multi-modal emotion recognition has attracted increasing attention in human-computer interaction, as it extracts complementary information from physiological and behavioral features. Compared to single modal approaches, multi-modal fusion methods are more susceptible to uncertainty in emotion recognition, such as heterogeneity and inconsistent predictions across different modalities. Previous multi-modal approaches ignore systematic modeling of uncertainty in fusion and revelation of dynamic variations in emotion process. In this article, we propose a dynamic confidence-aware fusion network for robust recognition of heterogeneous emotion features, including electroencephalogram (EEG) and facial expression. First, we develop a self-attention based multi-channel LSTM network to preliminarily align the heterogeneous emotion features. Second, we propose a confidence regression network to estimate true class probability (TCP) on each modality, which helps explore the uncertainty at modality level. Then, different modalities are weighted fused according to above two types of uncertainty. Finally, we adopt self-paced learning (SPL) mechanism to further improve the model robustness by alleviating negative effect from the hard learning samples. The experimental results on several multi-modal emotion datasets demonstrate the proposed method outperforms the state-of-the-art methods in emotion recognition performance and explicitly reveals the dynamic variation of emotion with uncertainty estimation.

References

[1]
J. Deng and F. Ren, “Multi-label emotion detection via emotion-specified feature extraction and emotion correlation learning,” IEEE Trans. Affective Comput., vol. 14, no. 1, pp. 475–486, First Quarter, 2023.
[2]
J. Cheng et al., “Emotion recognition from multi-channel EEG via deep forest,” IEEE J. Biomed. Health Inform., vol. 25, no. 2, pp. 453–464, Feb. 2021.
[3]
S. Koelstra et al., “DEAP: A database for emotion analysis; using physiological signals,” IEEE Trans. Affective Comput., vol. 3, no. 1, pp. 18–31, First Quarter, 2012.
[4]
W. Zheng, “Multichannel EEG-based emotion recognition via group sparse canonical correlation analysis,” IEEE Trans. Cogn. Devel. Syst., vol. 9, no. 3, pp. 281–290, Sep. 2017.
[5]
B. Cheng and G. Liu, “Emotion recognition from surface EMG signal using wavelet transform and neural network,” in Proc. 2nd Int. Conf. Bioinf. Biomed. Eng., 2008, pp. 1363–1366.
[6]
P. Sarkar and A. Etemad, “Self-supervised learning for ECG-based emotion recognition,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2020, pp. 3217–3221.
[7]
J. Hu, Y. Huang, X. Hu, and Y. Xu, “The acoustically emotion-aware conversational agent with speech emotion recognition and empathetic responses,” IEEE Trans. Affective Comput., vol. 14, no. 1, pp. 17–30, First Quarter, 2023.
[8]
R. Belmonte, B. Allaert, P. Tirilly, I. M. Bilasco, C. Djeraba, and N. Sebe, “Impact of facial landmark localization on facial expression recognition,” IEEE Trans. Affective Comput., vol. 14, no. 2, pp. 1267–1279, Second Quarter, 2023.
[9]
R. Zhao, T. Liu, Z. Huang, D. P.-K. Lun, and K. K. Lam, “Geometry-aware facial expression recognition via attentive graph convolutional networks,” IEEE Trans. Affective Comput., vol. 14, no. 2, pp. 1159–1174, Second Quarter, 2023.
[10]
H. Chen, H. Shi, X. Liu, X. Li, and G. Zhao, “SMG: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis,” Int. J. Comput. Vis., vol. 131, no. 6, pp. 1346–1366, 2023.
[11]
B. Puccio, P. Vizza, and P. Veltri, “On the use of EEG functional connectivity networks in epilepsy studies,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2022, pp. 2945–2948.
[12]
Y. Zhang, H. Liu, D. Zhang, X. Chen, T. Qin, and Q. Zheng, “EEG-based emotion recognition with emotion localization via hierarchical self-attention,” IEEE Trans. Affective Comput., vol. 14, no. 3, pp. 2458–2469, Third Quarter, 2023.
[13]
C. Li et al., “Effective emotion recognition by learning discriminative graph topologies in EEG brain networks,” IEEE Trans. Neural Netw. Learn. Syst., to be published.
[14]
Y. Peng, F. Jin, W. Kong, F. Nie, B.-L. Lu, and A. Cichocki, “OGSSL: A semi-supervised classification model coupled with optimal graph learning for EEG emotion recognition,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 30, pp. 1288–1297, May 2022.
[15]
X. Jia, X. Zheng, W. Li, C. Zhang, and Z. Li, “Facial emotion distribution learning by exploiting low-rank label correlations locally,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9841–9850.
[16]
Z. Zhao, Q. Liu, and F. Zhou, “Robust lightweight facial expression recognition network with label distribution training,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 3510–3519.
[17]
Y. Yang, Q. Gao, Y. Song, X. Song, Z. Mao, and J. Liu, “Investigating of deaf emotion cognition pattern by EEG and facial expression combination,” IEEE J. Biomed. Health Inform., vol. 26, no. 2, pp. 589–599, Feb. 2022.
[18]
S. K. D’mello and J. Kory, “A review and meta-analysis of multimodal affect detection systems,” ACM Comput. Surv., vol. 47, no. 3, pp. 1–36, 2015.
[19]
Y. Wang, W.-B. Jiang, R. Li, and B.-L. Lu, “Emotion transformer fusion: Complementary representation properties of EEG and eye movements on recognizing anger and surprise,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2021, pp. 1575–1578.
[20]
W. K. Ngai, H. Xie, D. Zou, and K.-L. Chou, “Emotion recognition based on convolutional neural networks and heterogeneous bio-signal data sources,” Inf. Fusion, vol. 77, pp. 107–117, 2022.
[21]
J. Ma, H. Tang, W.-L. Zheng, and B.-L. Lu, “Emotion recognition using multimodal residual LSTM network,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 176–183.
[22]
M. K. Tellamekala, S. Amiriparian, B. W. Schuller, E. André, T. Giesbrecht, and M. Valstar, “COLD fusion: Calibrated and ordinal latent distribution fusion for uncertainty-aware multimodal emotion recognition,” IEEE Trans. Pattern Anal. Mach. Intell., to be published.
[23]
F. Chen, J. Shao, A. Zhu, D. Ouyang, X. Liu, and H. T. Shen, “Modeling hierarchical uncertainty for multimodal emotion recognition in conversation,” IEEE Trans. Cybern., to be published.
[24]
D. Deng, L. Wu, and B. E. Shi, “Iterative distillation for better uncertainty estimates in multitask emotion recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 3557–3566.
[25]
L. Yang, Q. Chen, Q. Zhang, and S. Chao, “Intelligent feature selection for EEG emotion classification,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2021, pp. 3681–3688.
[26]
K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties for large-scale facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6897–6906.
[27]
C. Corbière, N. Thome, A. Bar-Hen, M. Cord, and P. Pérez, “Addressing failure prediction by learning model confidence,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, Art. no.
[28]
C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1613–1622.
[29]
Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1050–1059.
[30]
B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6405–6416.
[31]
A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 7047–7058.
[32]
T. Joo, U. Chung, and M.-G. Seo, “Being Bayesian about categorical probability,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 4950–4961.
[33]
S. Seo, P. H. Seo, and B. Han, “Learning for single-shot confidence calibration in deep neural networks through stochastic inferences,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9030–9038.
[34]
Y. Fan, R. He, J. Liang, and B. Hu, “Self-paced learning: An implicit regularization perspective,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 1877–1883.
[35]
L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. Hauptmann, “Self-paced curriculum learning,” in Proc. AAAI Conf. Artif. Intell., 2015, pp. 2694–2700.
[36]
S. Zhou et al., “Deep self-paced learning for person re-identification,” Pattern Recognit., vol. 76, pp. 739–751, 2018.
[37]
J. Ren et al., “Look, listen and learn—A multimodal LSTM for speaker identification,” in Proc. AAAI Conf. Artif. Intell., 2016, pp. 3581–3587.
[38]
M. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2010, pp. 1189–1197.
[39]
M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect recognition and implicit tagging,” IEEE Trans. Affective Comput., vol. 3, no. 1, pp. 42–55, First Quarter, 2012.
[40]
R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler, and N. Sebe, “ASCERTAIN: Emotion and personality recognition using commercial sensors,” IEEE Trans. Affective Comput., vol. 9, no. 2, pp. 147–160, Second Quarter, 2018.
[41]
T. Zhang, X. Wang, X. Xu, and C. P. Chen, “GCB-Net: Graph convolutional broad network and its application in emotion recognition,” IEEE Trans. Affective Comput., vol. 13, no. 1, pp. 379–388, First Quarter, 2022.
[42]
Z. Wang, T. Gu, Y. Zhu, D. Li, H. Yang, and W. Du, “FLDNet: Frame-level distilling neural network for EEG emotion recognition,” IEEE J. Biomed. Health Inform., vol. 25, no. 7, pp. 2533–2544, Jul. 2021.
[43]
S. Rayatdoost, D. Rudrauf, and M. Soleymani, “Multimodal gated information fusion for emotion recognition from EEG signals and facial behaviors,” in Proc. Int. Conf. Multimodal Interact., 2020, pp. 655–659.
[44]
Q. Cai, G.-C. Cui, and H.-X. Wang, “EEG-based emotion recognition using multiple kernel learning,” Mach. Intell. Res., vol. 19, no. 5, pp. 472–484, 2022.
[45]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[46]
H. Hotelling, “Relations between two sets of variates,” in Breakthroughs in Statistics: Methodology and Distribution. Berlin, Germany: Springer, 1992, pp. 162–190.
[47]
G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1247–1255.
[48]
K. Fukumizu, F. R. Bach, and A. Gretton, “Statistical consistency of kernel canonical correlation analysis,” J. Mach. Learn. Res., vol. 8, no. 2, pp. 361–383, 2007.
[49]
J. Arevalo, T. Solorio, M. Montes-y Gómez, and F. A. González, “Gated multimodal units for information fusion,” 2017,.
[50]
W. Jiang, X. Liu, W. Zheng, and B. Lu, “Multimodal adaptive emotion transformer with flexible modality inputs on a novel dataset with continuous labels,” in Proc. 31st ACM Int. Conf. Multimedia, 2023, pp. 5975–5984.
[51]
X. Cheng et al., “VigilanceNet: Decouple intra-and inter-modality learning for multimodal vigilance estimation in RSVP-based BCI,” in Proc. 30th ACM Int. Conf. Multimedia, 2022, pp. 209–217.
[52]
J. Heo et al., “Uncertainty-aware attention for reliable interpretation and prediction,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 917–926.
[53]
M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 3183–3193.
[54]
Z. Han, C. Zhang, H. Fu, and J. T. Zhou, “Trusted multi-view classification with dynamic evidential fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 2551–2566, Feb. 2023.
[55]
Y. Ding, N. Robinson, S. Zhang, Q. Zeng, and C. Guan, “TSception: Capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition,” IEEE Trans. Affective Comput., vol. 14, no. 3, pp. 2238–2250, Third Quarter, 2023.
[56]
L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing  Volume 15, Issue 3
July-Sept. 2024
1087 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 08 December 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media