Abstract
Recent advancements in AI resulted in increased availability of conversational assistants that can help with tasks such as seeking times to schedule an event and creating a calendar entry at that time, finding a restaurant and booking a table there at a certain time. However, creating automated agents with human-level intelligence still remains one of the most challenging problems of AI. One key component of such systems is conversational language understanding, which is a holy grail area of research for decades, as it is not a clearly defined task but relies heavily on the AI application it is used for. Nevertheless, this chapter attempts to compile the recent deep learning based literature on such goal-oriented conversational language understanding studies, starting with a historical perspective, pre-deep learning era work, moving toward most recent advances in this field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allen, J. (1995). Natural language understanding, chapter 8. Benjamin/Cummings.
Allen, J. F., Miller, B. W., Ringger, E. K., & Sikorski, T. (1996). A robust system for natural spoken dialogue. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 62–70.
Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Learning to compose neural networks for question answering. In Proceedings of NAACL.
Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Towards zero-shot frame semantic parsing for domain scaling. In Proceedings of the Interspeech.
Bellegarda, J. R. (2004). Statistical language model adaptation: Review and perspectives. Speech Communication Special Issue on Adaptation Methods for Speech Recognition, 42, 93–108.
Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., & Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of the Interspeech, Lisbon, Portugal.
Bowman, S. R., Gauthier, J., Rastogi, A., Gupta, R., & Manning, C. D. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of ACL.
Celikyilmaz, A., Sarikaya, R., Hakkani, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016).
Chen, Y.-N., Hakkani-Tur, D., & He, X. (2015a). Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proceedings of the IEEE ICASSP.
Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Chen, Y.-N., Wang, W. Y., Gershman, A., & Rudnicky, A. I. (2015b). Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proceedings of the ACLIJCNLP.
Chen, Y.-N., Wang, W. Y., & Rudnicky, A. I. (2013). Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In Proceedings of the IEEE ASRU.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chu-Carroll, J., & Carpenter, B. (1999). Vector-based natural language call routing. Computational Linguistics, 25(3), 361–388.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the ICML, Helsinki, Finland.
Dahl, D. A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., et al. (1994). Expanding the scope of the ATIS task: the ATIS-3 corpus. In Proceedings of the Human Language Technology Workshop. Morgan Kaufmann.
Damnati, G., Bechet, F., & de Mori, R. (2007). Spoken language understanding strategies on the france telecom 3000 voice agency corpus. In Proceedings of the ICASSP, Honolulu, HI.
Dauphin, Y., Tur, G., Hakkani-Tür, D., & Heck, L. (2014). Zero-shot learning and clustering for semantic utterance classification. In Proceedings of the ICLR.
Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.
Deng, L., & O’Shaughnessy, D. (2003). Speech processing: A dynamic and optimization-oriented approach. Marcel Dekker, New York: Publisher.
Deng, L., & Yu, D. (2011). Deep convex nets: A scalable architecture for speech pattern classification. In Proceedings of the Interspeech, Florence, Italy.
Deoras, A., & Sarikaya, R. (2013). Deep belief network based semantic taggers for spoken language understanding. In Proceedings of the IEEE Interspeech, Lyon, France.
Dupont, Y., Dinarelli, M., & Tellier, I. (2017). Label-dependencies aware recurrent neural networks. arXiv preprint arXiv:1706.01740.
Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179–211.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Gorin, A. L., Abella, A., Alonso, T., Riccardi, G., & Wright, J. H. (2002). Automated natural spoken dialog. IEEE Computer Magazine, 35(4), 51–56.
Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127.
Guo, D., Tur, G., Yih, W.-t., & Zweig, G. (2014). Joint semantic utterance classification and slot filling with recursive neural networks. In In Proceedings of the IEEE SLT Workshop.
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., & Rahim, M. (2006). The AT&T spoken language understanding system. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 213–222.
Hahn, S., Dinarelli, M., Raymond, C., Lefevre, F., Lehnen, P., Mori, R. D., et al. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1569–1583.
Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Interspeech, San Francisco, CA.
He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine, 28(5), 126–133.
He, X. & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In Proceedings of the IEEE, 101(5), 1116–1135.
Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In Proceedings of the Workshop on Speech and Natural Language, HLT’90, pp. 96–101, Morristown, NJ, USA. Association for Computational Linguistics.
Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Advances in Neural Computation, 18(7), 1527–1554.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
Hori, C., Hori, T., Watanabe, S., & Hershey, J. R. (2014). Context sensitive spoken language understanding using role dependent lstm layers. In Proceedings of the Machine Learning for SLU Interaction NIPS 2015 Workshop.
Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition, Chapter 15.
Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Jordan, M. (1997). Serial order: A parallel distributed processing approach. Technical Report 8604, University of California San Diego, Institute of Computer Science.
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the ACL, Baltimore, MD.
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the EMNLP, Doha, Qatar.
Kim, Y.-B., Stratos, K., Sarikaya, R., & Jeong, M. (2015). New transfer learning techniques for disparate label sets. In Proceedings of the ACL-IJCNLP.
Kuhn, R., & Mori, R. D. (1995). The application of semantic classification trees to natural language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 449–460.
Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016a). Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the EMNLP, Austin, TX.
Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016b). Leveraging sentence-level information with encoder lstm for semantic slot filling. arXiv preprint arXiv:1601.01530.
Lee, J. Y., & Dernoncourt, F. (2016). Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the NAACL.
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.
Liu, B., & Lane, I. (2015). Recurrent neural network structured output prediction for spoken language understanding. In Proc: NIPS Workshop on Machine Learning for Spoken Language Understanding and Interactions.
Liu, B., & Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Interspeech, San Francisco, CA.
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tür, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.
Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of the Interspeech, Lyon, France.
Natarajan, P., Prasad, R., Suhm, B., & McCarthy, D. (2002). Speech enabled natural language call routing: BBN call director. In Proceedings of the ICSLP, Denver, CO.
Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, E., Lee, C.-H., et al. (1992). A speech understanding system based on statistical representation of semantics. In Proceedings of the ICASSP, San Francisco, CA.
Price, P. J. (1990). Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, PA.
Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and lstm models for lexical utterance classification. In Proceedings of the Interspeech.
Raymond, C., & Riccardi, G. (2007). Generative and discriminative algorithms for spoken language understanding. In Proceedings of the Interspeech, Antwerp, Belgium.
Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 22(4).
Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In Proceedings of the ICASSP, Prague, Czech Republic.
Seneff, S. (1992). TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1), 61–86.
Simonnet, E., Camelin, N., Deleglise, P., & Esteve, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Proceedings of the NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction.
Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of ICML.
Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J. G., & Nie, J.-Y. (2015). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the ACM CIKM.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Advances in neural information processing systems 27, chapter Sequence to sequence learning with neural networks.
Tafforeau, J., Bechet, F., Artiere1, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in book: Spoken language understanding. New York, NY: Wiley.
Tur, G., Hakkani-Tür, D., & Heck, L. (2010). What is left to be understood in ATIS? In Proceedings of the IEEE SLT Workshop, Berkeley, CA.
Tur, G., & Mori, R. D. (Eds.). (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York, NY: Wiley.
Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Proceedings of the NIPS.
Vinyals, O., & Le, Q. V. (2015). A neural conversational model. In Proceedings of the ICML.
Vu, N. T., Gupta, P., Adel, H., & Schütze, H. (2016). Bi-directional recurrent neural network with ranking loss for spoken language understanding. In Proceedings of the IEEE ICASSP, Shanghai, China.
Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional gru network for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.
Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., et al. (2001). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the Eurospeech Conference.
Wang, Y., Deng, L., & Acero, A. (2011). Semantic frame based spoken language understanding, Chapter 3. New York, NY: Wiley.
Ward, W., & Issar, S. (1994). Recent improvements in the CMU spoken language understanding system. In Proceedings of the ARPA HLT Workshop, pages 213–216.
Weizenbaum, J. (1966). Eliza—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
Woods, W. A. (1983). Language processing for speech understanding. Prentice-Hall International, Englewood Cliffs, NJ: In Computer Speech Processing.
Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular crf for joint intent detection and slot filling. In Proceedings of the IEEE ASRU.
Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., & Shi, Y. (2014). Spoken language understanding using long short-term memory neural networks. In Proceedings of the IEEE SLT Workshop, South Lake Tahoe, CA. IEEE.
Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In Proceedings of the Interspeech, Lyon, France.
Zhai, F., Potdar, S., Xiang, B., & Zhou, B. (2017). Neural models for sequence chunking. In Proceedings of the AAAI.
Zhang, X., & Wang, H. (2016). A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the IJCAI.
Zhu, S., & Yu, K. (2016a). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In submission.
Zhu, S., & Yu, K. (2016b). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. arXiv preprint arXiv:1608.02097.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Tur, G., Celikyilmaz, A., He, X., Hakkani-Tür, D., Deng, L. (2018). Deep Learning in Conversational Language Understanding. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-5209-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5208-8
Online ISBN: 978-981-10-5209-5
eBook Packages: Computer ScienceComputer Science (R0)