Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340555.3356100acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Robust Spoken Language Understanding with Acoustic and Domain Knowledge

Published: 14 October 2019 Publication History

Abstract

Spoken language understanding (SLU) converts user utterances into structured semantic forms. There are still two main issues for SLU: robustness to ASR-errors and the data sparsity of new and extended domains. In this paper, we propose a robust SLU system by leveraging both acoustic and domain knowledge. We extract audio features by training ASR models on a large number of utterances without semantic annotations. For exploiting domain knowledge, we design lexicon features from the domain ontology and propose an error elimination algorithm to help predicted values recovered from ASR-errors. The results of CATSLU challenge show that our systems can outperform all of the other teams across four domains.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).
[2]
Ankur Bapna, Gökhan Tür, Dilek Hakkani-Tür, and Larry P. Heck. 2017. Towards Zero-Shot Frame Semantic Parsing for Domain Scaling. In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association. 2476–2480.
[3]
Renato De Mori, Frédéric Bechet, Dilek Hakkani-Tur, Michael McTear, Giuseppe Riccardi, and Gokhan Tur. 2008. Spoken language understanding. IEEE Signal Processing Magazine 25, 3 (2008), 50–58.
[4]
Alex Graves. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg.
[5]
Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, and Austin Waters. 2018. From Audio to Semantics: Approaches to end-to-end spoken language understanding. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 720–726.
[6]
Dilek Hakkani-Tür, Frédéric Béchet, Giuseppe Riccardi, and Gokhan Tur. 2006. Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20, 4 (2006), 495–514.
[7]
Matthew Henderson, Milica Gašić, Blaise Thomson, Pirros Tsiakoulis, Kai Yu, and Steve Young. 2012. Discriminative spoken language understanding using word confusion networks. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, 176–181.
[8]
Young-Bum Kim, Karl Stratos, and Dongchan Kim. 2017. Adversarial adaptation of synthetic or stale data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1297–1307.
[9]
Young-Bum Kim, Karl Stratos, and Ruhi Sarikaya. 2016. Frustratingly Easy Neural Domain Adaptation. In COLING. 387–396.
[10]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[11]
Bing Liu and Ian Lane. 2016. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. In 17th Annual Conference of the International Speech Communication Association (InterSpeech).
[12]
Bing Liu and Ian Lane. 2018. Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding. arXiv preprint arXiv:1807.00267(2018).
[13]
Yuxian Meng, Xiaoya Li, Xiaofei Sun, Qinghong Han, Arianna Yuan, and Jiwei Li. 2019. Is Word Segmentation Necessary for Deep Learning of Chinese Representations?arXiv preprint arXiv:1905.05526(2019).
[14]
Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, 2015. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 3(2015), 530–539.
[15]
Rohit Prabhavalkar, Tara N Sainath, Bo Li, Kanishka Rao, and Navdeep Jaitly. 2017. An Analysis of” Attention” in Sequence-to-Sequence Models. In Interspeech. 3702–3706.
[16]
Christian Raymond and Giuseppe Riccardi. 2007. Generative and discriminative algorithms for spoken language understanding. In Eighth Annual Conference of the International Speech Communication Association.
[17]
Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, and Karen Livescu. 2018. A comparison of techniques for language model integration in encoder-decoder speech recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 369–375.
[18]
Ye-Yi Wang, Li Deng, and Alex Acero. 2005. Spoken language understanding. IEEE Signal Processing Magazine 22, 5 (2005), 16–31.
[19]
Xiaohao Yang and Jia Liu. 2015. Using word confusion networks for slot filling in spoken language understanding. In Sixteenth Annual Conference of the International Speech Communication Association.
[20]
Xiaodong Zhang and Houfeng Wang. 2016. A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2993–2999.
[21]
Yu Zhang, William Chan, and Navdeep Jaitly. 2017. Very deep convolutional networks for end-to-end speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4845–4849.
[22]
Su Zhu, Ouyu Lan, and Kai Yu. 2018. Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018. IEEE, 6179–6183.
[23]
Su Zhu and Kai Yu. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5675–5679.
[24]
Su Zhu, Zijian Zhao, Tiejun Zhao, Chengqing Zong, and Kai Yu. 2019. CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge. In 2019 International Conference on Multimodal Interaction (in press).

Cited By

View all
  • (2022)ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language UnderstandingIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.315326830(1207-1218)Online publication date: 2022
  • (2021)Spoken Language Understanding with Sememe Knowledge as Domain Knowledge2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP49672.2021.9362087(1-5)Online publication date: 24-Jan-2021
  • (2020)Robust Spoken Language Understanding with RL-Based Value Error RecoveryNatural Language Processing and Chinese Computing10.1007/978-3-030-60450-9_7(78-90)Online publication date: 14-Oct-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '19: 2019 International Conference on Multimodal Interaction
October 2019
601 pages
ISBN:9781450368605
DOI:10.1145/3340555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Robustness
  2. Spoken Language Understanding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • China NSFC projects
  • National Key Research and Development Program of China

Conference

ICMI '19

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language UnderstandingIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.315326830(1207-1218)Online publication date: 2022
  • (2021)Spoken Language Understanding with Sememe Knowledge as Domain Knowledge2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP49672.2021.9362087(1-5)Online publication date: 24-Jan-2021
  • (2020)Robust Spoken Language Understanding with RL-Based Value Error RecoveryNatural Language Processing and Chinese Computing10.1007/978-3-030-60450-9_7(78-90)Online publication date: 14-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media