research-article

Dual Learning for Conversational Emotion Recognition and Emotional Response Generation

Authors:

Songlong XingAuthors Info & Claims

IEEE Transactions on Affective Computing, Volume 15, Issue 3

Pages 1241 - 1252

https://doi.org/10.1109/TAFFC.2023.3332631

Published: 01 July 2024 Publication History

Abstract

Emotion recognition in conversation (ERC) and emotional response generation (ERG) are two important NLP tasks. ERC aims to detect the utterance-level emotion from a dialogue, while ERG focuses on expressing a desired emotion. Essentially, ERC is a classification task, with its input and output domains being the utterance text and emotion labels, respectively. On the other hand, ERG is a generation task with its input and output domains being the opposite. These two tasks are highly related, but surprisingly, they are addressed independently without making use of their duality in prior works. Therefore, in this article, we propose to solve these two tasks in a dual learning framework. Our contributions are fourfold: (1) We propose a dual learning framework for ERC and ERG. (2) Within the proposed framework, two models can be trained jointly, so that the duality between them can be utilised. (3) Instead of a symmetric framework that deals with two tasks of the same data domain, we propose a dual learning framework that performs on a pair of asymmetric input and output spaces, i.e., the natural language space and the emotion labels. (4) Experiments are conducted on benchmark datasets to demonstrate the effectiveness of our framework.

References

[1]

R. W. Picard, “Affective computing: From laughter to IEEE,” IEEE Trans. Affect. Comput., vol. 1, no. 1, pp. 11–17, Jan. 2010.

Digital Library

[2]

B. A. Erol, A. Majumdar, P. Benavidez, P. Rad, K.-K. R. Choo, and M. Jamshidi, “Toward artificial emotional intelligence for cooperative social human–machine interaction,” IEEE Trans. Comput. Social Syst., vol. 7, no. 1, pp. 234–246, Feb. 2020.

[3]

C. Mumenthaler, D. Sander, and A. S. R. Manstead, “Emotion recognition in simulated social interactions,” IEEE Trans. Affect. Comput., vol. 11, no. 2, pp. 308–312, Second Quarter, 2020.

[4]

Z. Lian, B. Liu, and J. Tao, “Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition,” IEEE Trans. Affect. Comput., vol. 14, no. 3, pp. 2415–2429, Third Quarter, 2023.

Digital Library

[5]

S. Ruder, “An overview of multi-task learning in deep neural networks,” 2017,.

[6]

D. Varshney, A. Ekbal, and P. Bhattacharyya, “Modelling context emotions using multi-task learning for emotion controlled dialog generation,” in Proc. 16th Conf. Eur. Chapter Assoc. Comput. Linguistics: Main Volume, Association for Computational Linguistics, 2021, pp. 2919–2931. [Online]. Available: https://aclanthology.org/2021.eacl-main.255

[7]

P. Ekman et al., “Basic emotions,” Handbook Cogn. Emotion, vol. 98, no. 45/60, 1999, Art. no.

[8]

A. Mehrabian, Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies, Cambridge, MA, USA: Oelgeschlager, Gunn & Hain, 1980.

[9]

D. He et al., “Dual learning for machine translation,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., Red Hook, NY, USA: Curran Associates Inc., 2016, pp. 820–828.

[10]

D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L.-P. Morency, and R. Zimmermann, “Conversational memory network for emotion recognition in dyadic dialogue videos,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 2122–2132. [Online]. Available: https://www.aclweb.org/anthology/N18–1193

[11]

D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, “ICON: Interactive conversational memory network for multimodal emotion detection,” in Proc. Conf. Empir. Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2594–2604. [Online]. Available: https://www.aclweb.org/anthology/D18–1280

[12]

N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, “DialogueRNN: An attentive RNN for emotion detection in conversations,” in Proc. AAAI Conf. Artif. Intell., vol. 2019, pp. 6818–6825.

[13]

W. Jiao, M. Lyu, and I. King, “Real-time emotion recognition via attention gated hierarchical memory network,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 8002–8009.

[14]

W. Jiao, H. Yang, I. King, and M. R. Lyu, “HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 397–406. [Online]. Available: https://www.aclweb.org/anthology/N19–1037

[15]

D. Hu, L. Wei, and X. Huai, “DialogueCRN: Contextual reasoning networks for emotion recognition in conversations,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 7042–7052.

[16]

S. Xing, S. Mai, and H. Hu, “Adapted dynamic memory network for emotion recognition in conversation,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1426–1439, Third Quarter, 2022.

[17]

D. Hazarika, S. Poria, R. Zimmermann, and R. Mihalcea, “Conversational transfer learning for emotion recognition,” Inf. Fusion, vol. 65, pp. 1–12, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253520303018

[18]

D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, “DialogueGCN: A graph convolutional neural network for emotion recognition in conversation,” in Proc. Conf. Empir. Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., Hong Kong, China, Association for Computational Linguistics, 2019, pp. 154–164. [Online]. Available: https://www.aclweb.org/anthology/D19–1015

[19]

D. Zhang, L. Wu, C. Sun, S. Li, Q. Zhu, and G. Zhou, “Modeling both context- and speaker-sensitive dependence for emotion detection in multi-speaker conversations,” in Proc. 28h Int. Joint Conf. Artif. Intell., International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 5415–5421. [Online]. Available: https://doi.org/10.24963/ijcai.2019/752

[20]

P. Zhong, D. Wang, and C. Miao, “Knowledge-enriched transformer for emotion detection in textual conversations,” in Proc. Conf. Empir. Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., Hong Kong, China, Association for Computational Linguistics, 2019, pp. 165–176. [Online]. Available: https://www.aclweb.org/anthology/D19–1016

[21]

T. Ishiwatari, Y. Yasuda, T. Miyazaki, and J. Goto, “Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations,” in Proc. Conf. Empir. Methods Natural Lang. Process., Association for Computational Linguistics, 2020, pp. 7360–7370. [Online]. Available: https://aclanthology.org/2020.emnlp-main.597

[22]

D. Sheng, D. Wang, Y. Shen, H. Zheng, and H. Liu, “Summarize before aggregate: A global-to-local heterogeneous graph inference network for conversational emotion recognition,” in Proc. 28th Int. Conf. Comput. Linguistics. Barcelona, Spain, International Committee on Computational Linguistics, 2020, pp. 4153–4163.

[23]

W. Shen, S. Wu, Y. Yang, and X. Quan, “Directed acyclic graph network for conversational emotion recognition,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 1551–1560.

[24]

D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, and S. Poria, “COSMIC: Cmmonsense knowledge for emotion identification in conversations,” in Proc. Findings Assoc. Comput. Linguistics: Conf. Empir. Methods Natural Lang. Process., Association for Computational Linguistics, 2020, pp. 2470–2481.

[25]

W. Shen, J. Chen, X. Quan, and Z. Xie, “DialogXL: All-in-one XLNet for multi-party conversation emotion recognition,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 13789–13797.

[26]

H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional chatting machine: Emotional conversation generation with internal and external memory,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 8002–8009.

[27]

P. Colombo, W. Witon, A. Modi, J. Kennedy, and M. Kapadia, “Affect-driven dialog generation,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 3734–3743. [Online]. Available: https://aclanthology.org/N19–1374

[28]

X. Zhou and W. Y. Wang, “MojiTalk: Generating emotional responses at scale,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, Melbourne, Australia, Association for Computational Linguistics, 2018, pp. 1128–1137. [Online]. Available: https://aclanthology.org/P18–1104

[29]

C. Huang, O. Zaïane, A. Trabelsi, and N. Dziri, “Automatic dialogue generation with expressed emotions,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 49–54. [Online]. Available: https://aclanthology.org/N18–2008

[30]

L. Shen and Y. Feng, “CDL: Curriculum dual learning for emotion-controllable response generation,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, Association for Computational Linguistics, 2020, pp. 556–566. [Online]. Available: https://aclanthology.org/2020.acl-main.52

[31]

J. Wang, X. Sun, and M. Wang, “Emotional conversation generation with bilingual interactive decoding,” IEEE Trans. Comput. Social Syst., vol. 9, no. 3, pp. 818–829, Jun. 2022.

[32]

N. Asghar, P. Poupart, J. Hoey, X. Jiang, and L. Mou, “Affective neural response generation,” in Proc. Eur. Conf. Inf. Retrieval, Springer, 2018, pp. 154–166.

[33]

P. Zhong, D. Wang, and C. Miao, “An affect-rich neural conversational model with biased attention and weighted cross-entropy loss,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 7492–7500.

[34]

A. B. Warriner, V. Kuperman, and M. Brysbaert, “Norms of valence, arousal, and dominance for 13,915 english lemmas,” Behav. Res. Methods, vol. 45, no. 4, pp. 1191–1207, 2013.

[35]

S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L. P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proc. Conf. Assoc. Comput. Linguistics, 2017, pp. 873–883.

[36]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Digital Library

[37]

J. Weston, S. Chopra, and A. Bordes, “Memory networks,” in Proc. Int. Conf. Learn. Representations, 2015.

[38]

K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” 2014,. [Online]. Available: http://arxiv.org/abs/1406.1078

[39]

T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. Int. Conf. Learn. Representation, 2017.

[40]

A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017, pp. 5998–6008. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[41]

L. Zhu, G. Pergola, L. Gui, D. Zhou, and Y. He, “Topic-driven and knowledge-aware transformer for dialogue emotion detection,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 1571–1582.

[42]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19–1423

[43]

Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” 2019,. [Online]. Available: http://arxiv.org/abs/1907.11692

[44]

X. Song, L. Huang, H. Xue, and S. Hu, “Supervised prototypical contrastive learning for emotion recognition in conversation,” 2022,.

[45]

T. Gao, X. Yao, and D. Chen, “Simcse: Simple contrastive learning of sentence embeddings,” 2021,.

[46]

H. Prendinger and M. Ishizuka, “The empathic companion: A character-based interface that addresses users’ affective states,” Appl. Artif. Intell., vol. 19, no. 3/4, pp. 267–285, 2005. [Online]. Available: https://doi.org/10.1080/08839510590910174

[47]

M. Skowron, Affect listeners: Acquisition of affective states by means of conversational systems, Berlin, Heidelberg: Springer, 2010, pp. 169–181. [Online]. Available: https://doi.org/10.1007/978%E2%80%933-642-12397-9_14

[48]

C. Huang et al., “Generating responses expressing emotion in an open-domain dialogue system,” in Proc. Int. Conf. Internet Sci., Cham, Springer International Publishing, 2019, pp. 100–112.

[49]

S. Mohammad, “Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, Melbourne, Australia, Association for Computational Linguistics, 2018, pp. 174–184. [Online]. Available: https://aclanthology.org/P18–1017

[50]

N. Asghar, P. Poupart, J. Hoey, X. Jiang, and L. Mou, “Affective neural response generation,” in Proc. Eur. Conf. Adv. Inf. Retrieval, G. Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds., Cham, Springer International Publishing, 2018, pp. 154–166.

[51]

Z. Song, X. Zheng, L. Liu, M. Xu, and X. Huang, “Generating responses with a specific emotion in dialog,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, Association for Computational Linguistics, Jul. 2019, pp. 3685–3695. [Online]. Available: https://www.aclweb.org/anthology/P19–1359

[52]

Y. Liang, F. Meng, Y. Zhang, Y. Chen, J. Xu, and J. Zhou, “Infusing multi-source knowledge with heterogeneous graph neural network for emotional conversation generation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 13343–13352.

[53]

Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised dual learning for image-to-image translation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2849–2857.

[54]

D. Tang, N. Duan, T. Qin, Z. Yan, and M. Zhou, “Question answering and question generation as dual tasks,” 2017,.

[55]

Y. Xia, T. Qin, W. Chen, J. Bian, N. Yu, and T.-Y. Liu, “Dual supervised learning,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3789–3798.

[56]

H. Zhang, Y. Lan, J. Guo, J. Xu, and X. Cheng, “Reinforcing coherence for sequence to sequence model in dialogue generation,” in Proc. 27th Int. Joint Conf. Artif. Intell., 2018, pp. 4567–4573. [Online]. Available: https://doi.org/10.24963/ijcai.2018/635

[57]

F. Luo et al., “A dual reinforcement learning framework for unsupervised text style transfer,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019, pp. 5116–5122. [Online]. Available: https://doi.org/10.24963/ijcai.2019/711

[58]

F. Luo et al., “Towards fine-grained text sentiment transfer,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, Association for Computational Linguistics, 2019, pp. 2013–2022. [Online]. Available: https://www.aclweb.org/anthology/P19–1194

[59]

S.-Y. Su, C.-W. Huang, and Y.-N. Chen, “Dual supervised learning for natural language understanding and generation,” 2019,.

[60]

S.-Y. Su, C.-W. Huang, and Y.-N. Chen, “Towards unsupervised language understanding and generation by joint dual learning,” 2020,.

[61]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2018. [Online]. Available: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

[62]

S. Rothe, S. Narayan, and A. Severyn, “Leveraging pre-trained checkpoints for sequence generation tasks,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 264–280, 2020. [Online]. Available: https://aclanthology.org/2020.tacl-1.18

[63]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 12th Int. Conf. Neural Inf. Process. Syst., Cambridge, MA, USA, MIT Press, 1999, pp. 1057–1063.

[64]

T. Sellam, D. Das, and A. P. Parikh, “Bleurt: Learning robust metrics for text generation,” 2020,.

[65]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Representations, 2015.

[66]

C. Busso et al., “Iemocap: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.

[67]

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “MELD: A multimodal multi-party dataset for emotion recognition in conversations,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 527–526.

[68]

Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” 2013,.

Index Terms

Dual Learning for Conversational Emotion Recognition and Emotional Response Generation
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms

Index terms have been assigned to the content through auto-classification.

Recommendations

Neural Emotional Response Generation via Adversarial Transfer Learning
ICIAI '19: Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence

Emotional response generation is a key step to build an empathetic chatbot. However, previous emotional chatting models mainly focus on single-turn conversation, and multi-turn context emotional response generation has not been explored. In this paper, ...
Emotional Dialogue Generation Based on Conditional Variational Autoencoder and Dual Emotion Framework
An excellent dialogue system needs to not only generate rich and diverse logical responses but also meet the needs of users for emotional communication. However, despite much work, these two problems have not been solved. In this paper, we propose a model ...
Incorporating emotion for response generation in multi-turn dialogues
Abstract
Generating semantically and emotionally context-consistent responses is key to intelligent dialogue systems. Previous works mainly refer to the context in the dialogue history to generate semantically related responses, ignoring the potential ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Affective Computing

IEEE Transactions on Affective Computing Volume 15, Issue 3

July-Sept. 2024

1087 pages

Issue’s Table of Contents

1949-3045 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents