Abstract
For emotion recognition in conversation (ERC), the modeling of conversational dependency plays a crucial role. Existing methods often directly connect multimodal information and then build a graph neural network based on a fixed number of past and future utterances. The former leads to the lack of interaction between modalities, and the latter is less consistent with the logic of the conversation. Therefore, in order to better build conversational dependency, we propose a Dependency Graph Neural Network (DGNN) for ERC. First, we present a cross-modal fusion transformer for modeling dependency between different modalities of the same utterance. Then, we design a directed graph neural network model based on the adaptive window for modeling dependency between different utterances. The results of the extensive experiments on two benchmark datasets demonstrate the superiority of the proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bhavya, S., Nayak, D.S., Dmello, R.C., Nayak, A., Bangera, S.S.: Machine learning applied to speech emotion analysis for depression recognition. In: 2023 International Conference for Advancement in Technology (ICONAT), pp. 1–5 (2023)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
Cevallos, M., De Biase, M., Vocaturo, E., Zumpano, E.: Fake news detection on COVID 19 tweets via supervised learning approach. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2765–2772 (2022)
Deng, J., Ren, F.: A survey of textual emotion recognition and its challenges. IEEE Trans. Affect. Comput. 14(1), 49–67 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010)
Gao, P., Han, D., Zhou, R., Zhang, X., Wang, Z.: CAB: empathetic dialogue generation with cognition, affection and behavior. In: Database Systems for Advanced Applications: 28th International Conference, pp. 597–606 (2023)
Ghosal, D., Majumder, N., Gelbukh, A., Mihalcea, R., Poria, S.: COSMIC: COmmonSense knowledge for emotion identification in conversations. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2470–2481 (2020)
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.: DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 154–164 (2019)
Ghosal, S., Jain, A.: HateCircle and unsupervised hate speech detection incorporating emotion and contextual semantic. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 2375–4699 (2022)
Hazarika, D., Poria, S., Zadeh, A., Cambria, E., Morency, L.P., Zimmermann, R.: Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the 2018 conference of the Association for Computational Linguistics. vol. 2018, pp. 2122–2132 (2018)
Hu, D., Hou, X., Wei, L., Jiang, L., Mo, Y.: MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7037–7041 (2022)
Hu, D., Wei, L., Huai, X.: DialogueCRN: contextual reasoning networks for emotion recognition in conversations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pp. 2470–2481 (2021)
Hu, J., Liu, Y., Zhao, J., Jin, Q.: MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pp. 5666–5675 (2021)
Ishiwatari, T., Yasuda, Y., Miyazaki, T., Goto, J.: Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7360–7370 (2020)
Joshi, A., Bhat, A., Jain, A., Singh, A., Modi, A.: COGMEN: COntextualized GNN based multimodal emotion recognitioN. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4148–4164 (2022)
Li, J., Wang, X., Lv, G., Zeng, Z.: GraphCFC: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition. IEEE Transactions on Multimedia (2023)
Li, W., Shao, W., Ji, S., Cambria, E.: BiERU: bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467(7), 73–82 (2022)
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: DialogueRNN: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 6818–6825 (2019)
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: a multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536 (2019)
Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, pp. 593–607 (2018)
Shen, W., Wu, S., Yang, Y., Quan, X.: Directed acyclic graph network for conversational emotion recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1551–1560 (2021)
Acknowledgements
This research was supported by “Pioneer” and “Leading Goose” R &D Program of Zhejiang (Grant No. 2023C03203, 2023C03180, 2022C03174).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Z. et al. (2024). DGNN: Dependency Graph Neural Network for Multimodal Emotion Recognition in Conversation. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1963. Springer, Singapore. https://doi.org/10.1007/978-981-99-8138-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-8138-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8137-3
Online ISBN: 978-981-99-8138-0
eBook Packages: Computer ScienceComputer Science (R0)