research-article

Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons

Authors:

Thomaz Calasans,

Anna Helena Reali Costa,

Eduardo Raul HruschkaAuthors Info & Claims

ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering

Pages 397 - 401

https://doi.org/10.1145/3459104.3459169

Published: 20 July 2021 Publication History

Abstract

Contextualised word embeddings have recently become essential elements of Natural Language Processing (NLP) systems since these embedding models encode not only words but also their contexts to generate context-specific representations. Pre-trained models such as BERT, GPT, and derived architectures are increasingly present on NLP task benchmarks. Several comparative analyses of such models have been performed, but so far no one compares the most recent architectures in a dialogue generation dataset by considering multiple metrics relevant to the task. In this paper, we not only propose an encoder-decoder system that uses transfer learning with pre-trained word embeddings, but we also systematically compare various pretrained contextualised word embedding architectures on the DSTC-7 dataset, using metrics based on mutual information, dialogue length, and variety of answers. We use the word embeddings as a first layer of the encoder, making it possible to encode the texts in a latent space. As a decoder, we use an LSTM layer and a byte pair encoding tokenisation, aligned with state-of-the-art dialogue systems recently published. The networks are trained during the same amount of epochs, with the same optimisers and learning rates. Considering the quality of the dialogue, our results show that there is no superior technique on all metrics. However, there are relevant differences concerning the computational costs to encode the data.

References

[1]

Ian R Beaver. 2018. Automatic Conversation Review for Intelligent Virtual Assistants. https://digitalrepository.unm.edu/cs_etds/93 (2018).

[2]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv:1607.04606 (2016).

[3]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014).

[4]

Ron Chrisley. 2008. Philosophical foundations of artificial consciousness. Artificial Intelligence in Medicine 44, 2 (2008), 119–137.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[6]

Timothy Dozat. 2016. Incorporating Nesterov Momentum into Adam. ICLR 2016 Workshop (2016).

[7]

John R Firth. 1957. A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis, Philological Society, Oxford (1957).

[8]

MichelGalley, ChrisBrockett, XiangGao, BillDolan,and Jianfeng Gao. 2018. End-to-End Conversation Modeling: Moving beyond Chitchat. http://workshop.colips.org/dstc7 (2018).

[9]

Nuria Haristiani. 2019. Artificial Intelligence (AI) Chatbots Language Learning Medium: An inquiry. In Journal of Physics: Conference Series, Vol. 1387. IOP Publishing, 012020.

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[11]

Vladimir Ilievski, Claudiu Musat, Andreea Hossmann, and Michael Baeriswyl. 2018. Goal-oriented chatbot dialog management bootstrapping with transfer learning. arXiv:1802.00500 (2018).

[12]

H. N. Io and C. B. Lee. 2017. Chatbots and conversational agents: A bibliometric analysis. 2017 IEEE International Conference on Industrial Engineering and Engineering Management. (2017.). https://doi.org/10.1109/IEEM.2017.8289883

[13]

Chandra Khatri, Anu Venkatesh, Behnam Hedayatnia, Raefer Gabriel, Ashwin Ram, and Rohit Prasad. 2018. Alexa Prize—State of the Art in Conversational AI. AI Magazine 39, 3 (2018), 40–55.

[14]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942 (2019).

[15]

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Diversity-Promoting Objective Function for Neural Conversation Models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 110–119. https://doi.org/10.18653/v1/N16- 1014

[16]

Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1192–1202. https://doi.org/10.18653/v1/D16-1127

[17]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.

[18]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.

[19]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. https://openai.com/blog/language-unsupervised/

[20]

Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 583–593.

Digital Library

[21]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv:1508.07909 (2015).

[22]

Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence. 3776–3783

[23]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104–3112. http://papers.nips.cc/paper/5346-sequence-to-sequence- learning-with-neural-networks.pdf

Digital Library

[24]

Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, 242–264.

[25]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.

[26]

Jeremy West, Dan Ventura, and Sean Warnick. 2007. Spring research presentation: A theoretical foundation for inductive transfer. Brigham Young University, College of Physical and Mathematical Sciences 1 (2007), 32.

[27]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 (2019).

[28]

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation. arXiv:1911.00536 (2019).

Index Terms

Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons
1. Applied computing
  1. Arts and humanities
2. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Contextualized Word Embeddings for Universal Dependency Parsing

Deep contextualized word embeddings (Embeddings from Language Model, short for ELMo), as an emerging and effective replacement for the static word embeddings, have achieved success on a bunch of syntactic and semantic NLP problems. However, little is ...
Learning class-specific word embeddings
Abstract
Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since ...
Jointly learning bilingual word embeddings and alignments
Abstract
Learning bilingual word embeddings can be much easier if the parallel corpora are available with their words well aligned explicitly. However, in most cases, the parallel corpora only provide a set of pairs that are semantically equivalent to each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering

February 2021

644 pages

ISBN:9781450389839

DOI:10.1145/3459104

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Conference

ISEEIE 2021

ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering

February 19 - 21, 2021

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten