research-article

A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP

Authors:

Wei Emma ZhangAuthors Info & Claims

ACSW '20: Proceedings of the Australasian Computer Science Week Multiconference

Article No.: 11, Pages 1 - 4

https://doi.org/10.1145/3373017.3373028

Published: 04 February 2020 Publication History

Abstract

Building a dialogue system that can communicate naturally with humans is a challenging yet interesting problem of agent-based computing. The rapid growth in this area is usually hindered by the long-standing problem of data scarcity as these systems are expected to learn syntax, grammar, decision making, and reasoning from insufficient amounts of task-specific dataset. The recently introduced pre-trained language models have the potential to address the issue of data scarcity and bring considerable advantages by generating contextualized word embeddings. These models are considered counterpart of ImageNet in NLP and have demonstrated to capture different facets of language such as hierarchical relations, long-term dependency, and sentiment. In this short survey paper, we discuss the recent progress made in the field of pre-trained language models. We also deliberate that how the strengths of these language models can be leveraged in designing more engaging and more eloquent conversational agents. This paper, therefore, intends to establish whether these pre-trained models can overcome the challenges pertinent to dialogue systems, and how their architecture could be exploited in order to overcome these challenges. Open challenges in the field of dialogue systems have also been deliberated.

References

[1]

Pawel Budzianowski and Ivan Vulic. 2019. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems. CoRR abs/1907.05774(2019). arxiv:1907.05774

[2]

Pawel Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 5016–5026. https://doi.org/10.18653/v1/d18-1547

[3]

Guan-Lin Chao and Ian Lane. 2019. BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer. In Proc. Interspeech 2019. https://doi.org/10.21437/interspeech.2019-1355

[4]

Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 2174–2184. https://doi.org/10.18653/v1/d18-1241

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171–4186.

[6]

Jianfeng Gao, Michel Galley, and Lihong Li. 2018. Neural Approaches to Conversational AI. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018. 1371–1374. https://doi.org/10.1145/3209978.3210183

Digital Library

[7]

Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. 2008. Zero-data Learning of New Tasks. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008. 646–651.

[8]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692(2019). arxiv:1907.11692

[9]

Yasuhito Ohsugi, Itsumi Saito, Kyosuke Nishida, Hisako Asano, and Junji Tomita. 2019. A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension. In Proceedings of the First Workshop on NLP for Conversational AI. Association for Computational Linguistics, Florence, Italy, 11–17. https://doi.org/10.18653/v1/w19-4102

[10]

Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, and Tom M. Mitchell. 2009. Zero-shot Learning with Semantic Output Codes. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.1410–1418.

[11]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). 2227–2237. https://doi.org/10.18653/v1/n18-1202

[12]

Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, and Mohit Iyyer. 2019. BERT with History Answer Embedding for Conversational Question Answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019.1133–1136. https://doi.org/10.1145/3331184.3331341

Digital Library

[13]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf (2018).

[14]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).

[15]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. 2383–2392. https://doi.org/10.18653/v1/d16-1264

[16]

Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. TACL 7(2019), 249–266. https://doi.org/10.1162/tacl_a_00266

[17]

Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional Attention Flow for Machine Comprehension. abs/1611.01603 (2016).

[18]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. CoRR abs/1906.08237(2019). arxiv:1906.08237

[19]

Yi Ting Yeh and Yun-Nung Chen. 2019. FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension. CoRR abs/1908.05117(2019). arxiv:1908.05117

[20]

Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. 2019. Semantics-aware BERT for Language Understanding. CoRR abs/1909.02209(2019). arxiv:1909.02209

Cited By

Yenduri GRamalingam MSelvi GSupriya YSrivastava GMaddikunta PRaj GJhaveri RPrabadevi BWang WVasilakos AGadekallu T(2024)GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future DirectionsIEEE Access10.1109/ACCESS.2024.338949712(54608-54649)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3389497
Zamil YCharkari N(2024)Combating Fake News on Social Media: A Fusion Approach for Improved Detection and InterpretabilityIEEE Access10.1109/ACCESS.2023.334284312(2074-2085)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3342843
Ali DIqbal SMehmood SKhalil IUllah IKhan HAli F(2024)Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative ProspectsArtificial General Intelligence (AGI) Security10.1007/978-981-97-3222-7_10(211-226)Online publication date: 31-Aug-2024
https://doi.org/10.1007/978-981-97-3222-7_10
Show More Cited By

A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP
1. Computing methodologies
  1. Artificial intelligence

Recommendations

A Survey of Knowledge Enhanced Pre-trained Language Models
Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural language processing (NLP) after fine-tuning. These models, ...
Impact of Morphological Segmentation on Pre-trained Language Models
Intelligent Systems
Abstract
Pre-trained Language Models are the current state-of-the-art in many natural language processing tasks. These models rely on subword-based tokenization to solve the problem of out-of-vocabulary words. However, commonly used subword segmentation ...
Can Pre-trained Language Models Understand Chinese Humor?
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSW '20: Proceedings of the Australasian Computer Science Week Multiconference

February 2020

367 pages

ISBN:9781450376976

DOI:10.1145/3373017

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACSW '20

ACSW '20: Australasian Computer Science Week 2020

February 4 - 6, 2020

VIC, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 61 of 141 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
1,399
Total Downloads

Downloads (Last 12 months)219
Downloads (Last 6 weeks)7

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yenduri GRamalingam MSelvi GSupriya YSrivastava GMaddikunta PRaj GJhaveri RPrabadevi BWang WVasilakos AGadekallu T(2024)GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future DirectionsIEEE Access10.1109/ACCESS.2024.338949712(54608-54649)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3389497
Zamil YCharkari N(2024)Combating Fake News on Social Media: A Fusion Approach for Improved Detection and InterpretabilityIEEE Access10.1109/ACCESS.2023.334284312(2074-2085)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3342843
Ali DIqbal SMehmood SKhalil IUllah IKhan HAli F(2024)Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative ProspectsArtificial General Intelligence (AGI) Security10.1007/978-981-97-3222-7_10(211-226)Online publication date: 31-Aug-2024
https://doi.org/10.1007/978-981-97-3222-7_10
Priyanka Kumari RBansal PDev A(2024)Evolution of ChatGPT and Different Language Models: A ReviewSmart Trends in Computing and Communications10.1007/978-981-97-1313-4_8(87-97)Online publication date: 2-Jun-2024
https://doi.org/10.1007/978-981-97-1313-4_8
Buche AChandak M(2023)Enhancing predictive modeling for Indian banking stock trendsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23147245:5(8761-8773)Online publication date: 4-Nov-2023
https://dl.acm.org/doi/10.3233/JIFS-231472
Ross SMartinez FHoude SMuller MWeisz J(2023)The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software DevelopmentProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584037(491-514)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584037
Shen KKejriwal M(2023)An experimental study measuring the generalization of fine‐tuned language representation models across commonsense reasoning benchmarksExpert Systems10.1111/exsy.1324340:5Online publication date: 10-Feb-2023
https://doi.org/10.1111/exsy.13243
Zaib MSheng QZhang WMahmood A(2023)Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191510(1-7)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191510
Sahoo AChanda RDas NSadhukhan B(2023)Comparative Analysis of BERT Models for Sentiment Analysis on Twitter Data2023 9th International Conference on Smart Computing and Communications (ICSCC)10.1109/ICSCC59169.2023.10335061(658-663)Online publication date: 17-Aug-2023
https://doi.org/10.1109/ICSCC59169.2023.10335061
Ma RZou XTao PTu Y(2023)Research on Error Handling Techniques for Speech Interaction2023 IEEE 3rd International Conference on Computer Systems (ICCS)10.1109/ICCS59700.2023.10335595(16-20)Online publication date: 22-Sep-2023
https://doi.org/10.1109/ICCS59700.2023.10335595
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents