Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Dual Learning for Conversational Emotion Recognition and Emotional Response Generation

Published: 14 November 2023 Publication History

Abstract

Emotion recognition in conversation (ERC) and emotional response generation (ERG) are two important NLP tasks. ERC aims to detect the utterance-level emotion from a dialogue, while ERG focuses on expressing a desired emotion. Essentially, ERC is a classification task, with its input and output domains being the utterance text and emotion labels, respectively. On the other hand, ERG is a generation task with its input and output domains being the opposite. These two tasks are highly related, but surprisingly, they are addressed independently without making use of their duality in prior works. Therefore, in this article, we propose to solve these two tasks in a dual learning framework. Our contributions are fourfold: (1) We propose a dual learning framework for ERC and ERG. (2) Within the proposed framework, two models can be trained jointly, so that the duality between them can be utilised. (3) Instead of a symmetric framework that deals with two tasks of the same data domain, we propose a dual learning framework that performs on a pair of asymmetric input and output spaces, i.e., the natural language space and the emotion labels. (4) Experiments are conducted on benchmark datasets to demonstrate the effectiveness of our framework.

References

[1]
R. W. Picard, “Affective computing: From laughter to IEEE,” IEEE Trans. Affect. Comput., vol. 1, no. 1, pp. 11–17, Jan. 2010.
[2]
B. A. Erol, A. Majumdar, P. Benavidez, P. Rad, K.-K. R. Choo, and M. Jamshidi, “Toward artificial emotional intelligence for cooperative social human–machine interaction,” IEEE Trans. Comput. Social Syst., vol. 7, no. 1, pp. 234–246, Feb. 2020.
[3]
C. Mumenthaler, D. Sander, and A. S. R. Manstead, “Emotion recognition in simulated social interactions,” IEEE Trans. Affect. Comput., vol. 11, no. 2, pp. 308–312, Second Quarter, 2020.
[4]
Z. Lian, B. Liu, and J. Tao, “Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition,” IEEE Trans. Affect. Comput., vol. 14, no. 3, pp. 2415–2429, Third Quarter, 2023.
[5]
S. Ruder, “An overview of multi-task learning in deep neural networks,” 2017,.
[6]
D. Varshney, A. Ekbal, and P. Bhattacharyya, “Modelling context emotions using multi-task learning for emotion controlled dialog generation,” in Proc. 16th Conf. Eur. Chapter Assoc. Comput. Linguistics: Main Volume, Association for Computational Linguistics, 2021, pp. 2919–2931. [Online]. Available: https://aclanthology.org/2021.eacl-main.255
[7]
P. Ekman et al., “Basic emotions,” Handbook Cogn. Emotion, vol. 98, no. 45/60, 1999, Art. no.
[8]
A. Mehrabian, Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies, Cambridge, MA, USA: Oelgeschlager, Gunn & Hain, 1980.
[9]
D. He et al., “Dual learning for machine translation,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., Red Hook, NY, USA: Curran Associates Inc., 2016, pp. 820–828.
[10]
D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L.-P. Morency, and R. Zimmermann, “Conversational memory network for emotion recognition in dyadic dialogue videos,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 2122–2132. [Online]. Available: https://www.aclweb.org/anthology/N18–1193
[11]
D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, “ICON: Interactive conversational memory network for multimodal emotion detection,” in Proc. Conf. Empir. Methods Natural Lang. Process., Brussels, Belgium, 2018, pp. 2594–2604. [Online]. Available: https://www.aclweb.org/anthology/D18–1280
[12]
N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, “DialogueRNN: An attentive RNN for emotion detection in conversations,” in Proc. AAAI Conf. Artif. Intell., vol. 2019, pp. 6818–6825.
[13]
W. Jiao, M. Lyu, and I. King, “Real-time emotion recognition via attention gated hierarchical memory network,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 8002–8009.
[14]
W. Jiao, H. Yang, I. King, and M. R. Lyu, “HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 397–406. [Online]. Available: https://www.aclweb.org/anthology/N19–1037
[15]
D. Hu, L. Wei, and X. Huai, “DialogueCRN: Contextual reasoning networks for emotion recognition in conversations,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 7042–7052.
[16]
S. Xing, S. Mai, and H. Hu, “Adapted dynamic memory network for emotion recognition in conversation,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1426–1439, Third Quarter, 2022.
[17]
D. Hazarika, S. Poria, R. Zimmermann, and R. Mihalcea, “Conversational transfer learning for emotion recognition,” Inf. Fusion, vol. 65, pp. 1–12, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253520303018
[18]
D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, “DialogueGCN: A graph convolutional neural network for emotion recognition in conversation,” in Proc. Conf. Empir. Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., Hong Kong, China, Association for Computational Linguistics, 2019, pp. 154–164. [Online]. Available: https://www.aclweb.org/anthology/D19–1015
[19]
D. Zhang, L. Wu, C. Sun, S. Li, Q. Zhu, and G. Zhou, “Modeling both context- and speaker-sensitive dependence for emotion detection in multi-speaker conversations,” in Proc. 28h Int. Joint Conf. Artif. Intell., International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 5415–5421. [Online]. Available: https://doi.org/10.24963/ijcai.2019/752
[20]
P. Zhong, D. Wang, and C. Miao, “Knowledge-enriched transformer for emotion detection in textual conversations,” in Proc. Conf. Empir. Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., Hong Kong, China, Association for Computational Linguistics, 2019, pp. 165–176. [Online]. Available: https://www.aclweb.org/anthology/D19–1016
[21]
T. Ishiwatari, Y. Yasuda, T. Miyazaki, and J. Goto, “Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations,” in Proc. Conf. Empir. Methods Natural Lang. Process., Association for Computational Linguistics, 2020, pp. 7360–7370. [Online]. Available: https://aclanthology.org/2020.emnlp-main.597
[22]
D. Sheng, D. Wang, Y. Shen, H. Zheng, and H. Liu, “Summarize before aggregate: A global-to-local heterogeneous graph inference network for conversational emotion recognition,” in Proc. 28th Int. Conf. Comput. Linguistics. Barcelona, Spain, International Committee on Computational Linguistics, 2020, pp. 4153–4163.
[23]
W. Shen, S. Wu, Y. Yang, and X. Quan, “Directed acyclic graph network for conversational emotion recognition,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 1551–1560.
[24]
D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, and S. Poria, “COSMIC: Cmmonsense knowledge for emotion identification in conversations,” in Proc. Findings Assoc. Comput. Linguistics: Conf. Empir. Methods Natural Lang. Process., Association for Computational Linguistics, 2020, pp. 2470–2481.
[25]
W. Shen, J. Chen, X. Quan, and Z. Xie, “DialogXL: All-in-one XLNet for multi-party conversation emotion recognition,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 13789–13797.
[26]
H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional chatting machine: Emotional conversation generation with internal and external memory,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 8002–8009.
[27]
P. Colombo, W. Witon, A. Modi, J. Kennedy, and M. Kapadia, “Affect-driven dialog generation,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 3734–3743. [Online]. Available: https://aclanthology.org/N19–1374
[28]
X. Zhou and W. Y. Wang, “MojiTalk: Generating emotional responses at scale,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, Melbourne, Australia, Association for Computational Linguistics, 2018, pp. 1128–1137. [Online]. Available: https://aclanthology.org/P18–1104
[29]
C. Huang, O. Zaïane, A. Trabelsi, and N. Dziri, “Automatic dialogue generation with expressed emotions,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 49–54. [Online]. Available: https://aclanthology.org/N18–2008
[30]
L. Shen and Y. Feng, “CDL: Curriculum dual learning for emotion-controllable response generation,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, Association for Computational Linguistics, 2020, pp. 556–566. [Online]. Available: https://aclanthology.org/2020.acl-main.52
[31]
J. Wang, X. Sun, and M. Wang, “Emotional conversation generation with bilingual interactive decoding,” IEEE Trans. Comput. Social Syst., vol. 9, no. 3, pp. 818–829, Jun. 2022.
[32]
N. Asghar, P. Poupart, J. Hoey, X. Jiang, and L. Mou, “Affective neural response generation,” in Proc. Eur. Conf. Inf. Retrieval, Springer, 2018, pp. 154–166.
[33]
P. Zhong, D. Wang, and C. Miao, “An affect-rich neural conversational model with biased attention and weighted cross-entropy loss,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 7492–7500.
[34]
A. B. Warriner, V. Kuperman, and M. Brysbaert, “Norms of valence, arousal, and dominance for 13,915 english lemmas,” Behav. Res. Methods, vol. 45, no. 4, pp. 1191–1207, 2013.
[35]
S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L. P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proc. Conf. Assoc. Comput. Linguistics, 2017, pp. 873–883.
[36]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[37]
J. Weston, S. Chopra, and A. Bordes, “Memory networks,” in Proc. Int. Conf. Learn. Representations, 2015.
[38]
K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” 2014,. [Online]. Available: http://arxiv.org/abs/1406.1078
[39]
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. Int. Conf. Learn. Representation, 2017.
[40]
A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017, pp. 5998–6008. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[41]
L. Zhu, G. Pergola, L. Gui, D. Zhou, and Y. He, “Topic-driven and knowledge-aware transformer for dialogue emotion detection,” in Proc. 59th Annu. Meeting Assoc. Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., Association for Computational Linguistics, 2021, pp. 1571–1582.
[42]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North Amer. Assoc. Comput. Linguistics: Hum. Lang. Technol., Minneapolis, Minnesota, Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19–1423
[43]
Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” 2019,. [Online]. Available: http://arxiv.org/abs/1907.11692
[44]
X. Song, L. Huang, H. Xue, and S. Hu, “Supervised prototypical contrastive learning for emotion recognition in conversation,” 2022,.
[45]
T. Gao, X. Yao, and D. Chen, “Simcse: Simple contrastive learning of sentence embeddings,” 2021,.
[46]
H. Prendinger and M. Ishizuka, “The empathic companion: A character-based interface that addresses users’ affective states,” Appl. Artif. Intell., vol. 19, no. 3/4, pp. 267–285, 2005. [Online]. Available: https://doi.org/10.1080/08839510590910174
[47]
M. Skowron, Affect listeners: Acquisition of affective states by means of conversational systems, Berlin, Heidelberg: Springer, 2010, pp. 169–181. [Online]. Available: https://doi.org/10.1007/978%E2%80%933-642-12397-9_14
[48]
C. Huang et al., “Generating responses expressing emotion in an open-domain dialogue system,” in Proc. Int. Conf. Internet Sci., Cham, Springer International Publishing, 2019, pp. 100–112.
[49]
S. Mohammad, “Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, Melbourne, Australia, Association for Computational Linguistics, 2018, pp. 174–184. [Online]. Available: https://aclanthology.org/P18–1017
[50]
N. Asghar, P. Poupart, J. Hoey, X. Jiang, and L. Mou, “Affective neural response generation,” in Proc. Eur. Conf. Adv. Inf. Retrieval, G. Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds., Cham, Springer International Publishing, 2018, pp. 154–166.
[51]
Z. Song, X. Zheng, L. Liu, M. Xu, and X. Huang, “Generating responses with a specific emotion in dialog,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, Association for Computational Linguistics, Jul. 2019, pp. 3685–3695. [Online]. Available: https://www.aclweb.org/anthology/P19–1359
[52]
Y. Liang, F. Meng, Y. Zhang, Y. Chen, J. Xu, and J. Zhou, “Infusing multi-source knowledge with heterogeneous graph neural network for emotional conversation generation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 13343–13352.
[53]
Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised dual learning for image-to-image translation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2849–2857.
[54]
D. Tang, N. Duan, T. Qin, Z. Yan, and M. Zhou, “Question answering and question generation as dual tasks,” 2017,.
[55]
Y. Xia, T. Qin, W. Chen, J. Bian, N. Yu, and T.-Y. Liu, “Dual supervised learning,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3789–3798.
[56]
H. Zhang, Y. Lan, J. Guo, J. Xu, and X. Cheng, “Reinforcing coherence for sequence to sequence model in dialogue generation,” in Proc. 27th Int. Joint Conf. Artif. Intell., 2018, pp. 4567–4573. [Online]. Available: https://doi.org/10.24963/ijcai.2018/635
[57]
F. Luo et al., “A dual reinforcement learning framework for unsupervised text style transfer,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019, pp. 5116–5122. [Online]. Available: https://doi.org/10.24963/ijcai.2019/711
[58]
F. Luo et al., “Towards fine-grained text sentiment transfer,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Florence, Italy, Association for Computational Linguistics, 2019, pp. 2013–2022. [Online]. Available: https://www.aclweb.org/anthology/P19–1194
[59]
S.-Y. Su, C.-W. Huang, and Y.-N. Chen, “Dual supervised learning for natural language understanding and generation,” 2019,.
[60]
S.-Y. Su, C.-W. Huang, and Y.-N. Chen, “Towards unsupervised language understanding and generation by joint dual learning,” 2020,.
[61]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2018. [Online]. Available: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[62]
S. Rothe, S. Narayan, and A. Severyn, “Leveraging pre-trained checkpoints for sequence generation tasks,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 264–280, 2020. [Online]. Available: https://aclanthology.org/2020.tacl-1.18
[63]
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 12th Int. Conf. Neural Inf. Process. Syst., Cambridge, MA, USA, MIT Press, 1999, pp. 1057–1063.
[64]
T. Sellam, D. Das, and A. P. Parikh, “Bleurt: Learning robust metrics for text generation,” 2020,.
[65]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Representations, 2015.
[66]
C. Busso et al., “Iemocap: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.
[67]
S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “MELD: A multimodal multi-party dataset for emotion recognition in conversations,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 527–526.
[68]
Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” 2013,.

Index Terms

  1. Dual Learning for Conversational Emotion Recognition and Emotional Response Generation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Affective Computing
            IEEE Transactions on Affective Computing  Volume 15, Issue 3
            July-Sept. 2024
            1087 pages

            Publisher

            IEEE Computer Society Press

            Washington, DC, United States

            Publication History

            Published: 14 November 2023

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 21 Dec 2024

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media