research-article

Robust Spoken Language Understanding with Acoustic and Domain Knowledge

Authors:

Kai YuAuthors Info & Claims

ICMI '19: 2019 International Conference on Multimodal Interaction

Pages 531 - 535

https://doi.org/10.1145/3340555.3356100

Published: 14 October 2019 Publication History

Abstract

Spoken language understanding (SLU) converts user utterances into structured semantic forms. There are still two main issues for SLU: robustness to ASR-errors and the data sparsity of new and extended domains. In this paper, we propose a robust SLU system by leveraging both acoustic and domain knowledge. We extract audio features by training ASR models on a large number of utterances without semantic annotations. For exploiting domain knowledge, we design lexicon features from the domain ontology and propose an error elimination algorithm to help predicted values recovered from ASR-errors. The results of CATSLU challenge show that our systems can outperform all of the other teams across four domains.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).

[2]

Ankur Bapna, Gökhan Tür, Dilek Hakkani-Tür, and Larry P. Heck. 2017. Towards Zero-Shot Frame Semantic Parsing for Domain Scaling. In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association. 2476–2480.

[3]

Renato De Mori, Frédéric Bechet, Dilek Hakkani-Tur, Michael McTear, Giuseppe Riccardi, and Gokhan Tur. 2008. Spoken language understanding. IEEE Signal Processing Magazine 25, 3 (2008), 50–58.

[4]

Alex Graves. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg.

[5]

Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, and Austin Waters. 2018. From Audio to Semantics: Approaches to end-to-end spoken language understanding. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 720–726.

[6]

Dilek Hakkani-Tür, Frédéric Béchet, Giuseppe Riccardi, and Gokhan Tur. 2006. Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20, 4 (2006), 495–514.

[7]

Matthew Henderson, Milica Gašić, Blaise Thomson, Pirros Tsiakoulis, Kai Yu, and Steve Young. 2012. Discriminative spoken language understanding using word confusion networks. In 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, 176–181.

[8]

Young-Bum Kim, Karl Stratos, and Dongchan Kim. 2017. Adversarial adaptation of synthetic or stale data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1297–1307.

[9]

Young-Bum Kim, Karl Stratos, and Ruhi Sarikaya. 2016. Frustratingly Easy Neural Domain Adaptation. In COLING. 387–396.

[10]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[11]

Bing Liu and Ian Lane. 2016. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. In 17th Annual Conference of the International Speech Communication Association (InterSpeech).

[12]

Bing Liu and Ian Lane. 2018. Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding. arXiv preprint arXiv:1807.00267(2018).

[13]

Yuxian Meng, Xiaoya Li, Xiaofei Sun, Qinghong Han, Arianna Yuan, and Jiwei Li. 2019. Is Word Segmentation Necessary for Deep Learning of Chinese Representations?arXiv preprint arXiv:1905.05526(2019).

[14]

Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, 2015. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 3(2015), 530–539.

Digital Library

[15]

Rohit Prabhavalkar, Tara N Sainath, Bo Li, Kanishka Rao, and Navdeep Jaitly. 2017. An Analysis of” Attention” in Sequence-to-Sequence Models. In Interspeech. 3702–3706.

[16]

Christian Raymond and Giuseppe Riccardi. 2007. Generative and discriminative algorithms for spoken language understanding. In Eighth Annual Conference of the International Speech Communication Association.

[17]

Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, and Karen Livescu. 2018. A comparison of techniques for language model integration in encoder-decoder speech recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 369–375.

[18]

Ye-Yi Wang, Li Deng, and Alex Acero. 2005. Spoken language understanding. IEEE Signal Processing Magazine 22, 5 (2005), 16–31.

[19]

Xiaohao Yang and Jia Liu. 2015. Using word confusion networks for slot filling in spoken language understanding. In Sixteenth Annual Conference of the International Speech Communication Association.

[20]

Xiaodong Zhang and Houfeng Wang. 2016. A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2993–2999.

[21]

Yu Zhang, William Chan, and Navdeep Jaitly. 2017. Very deep convolutional networks for end-to-end speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4845–4849.

Digital Library

[22]

Su Zhu, Ouyu Lan, and Kai Yu. 2018. Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018. IEEE, 6179–6183.

[23]

Su Zhu and Kai Yu. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5675–5679.

Digital Library

[24]

Su Zhu, Zijian Zhao, Tiejun Zhao, Chengqing Zong, and Kai Yu. 2019. CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge. In 2019 International Conference on Multimodal Interaction (in press).

Cited By

Wang CDai SWang YYang FQiu MChen KZhou WHuang J(2022)ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language UnderstandingIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.315326830(1207-1218)Online publication date: 2022
https://doi.org/10.1109/TASLP.2022.3153268
Li SDang JWang L(2021)Spoken Language Understanding with Sememe Knowledge as Domain Knowledge2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP49672.2021.9362087(1-5)Online publication date: 24-Jan-2021
https://doi.org/10.1109/ISCSLP49672.2021.9362087
Liu CZhu SChen LYu K(2020)Robust Spoken Language Understanding with RL-Based Value Error RecoveryNatural Language Processing and Chinese Computing10.1007/978-3-030-60450-9_7(78-90)Online publication date: 14-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60450-9_7

Recommendations

Transfer Learning Methods for Spoken Language Understanding
ICMI '19: 2019 International Conference on Multimodal Interaction

In this paper, we present a series of methods to improve the performance of spoken language understanding in the 1st Chinese Audio-Textual Spoken Language Understanding Challenge (CATSLU 2019) which is aimed to improve the robustness for automatic ...
Joint Spoken Language Understanding and Domain Adaptive Language Modeling
Intelligence Science and Big Data Engineering
Abstract
Spoken Language Understanding (SLU) aims to extract structured information from speech recognized texts, which suffers from inaccurate automatic speech recognition (ASR) (especially in a specific dialogue domain). Language Modeling (LM) is ...
Toward Robust Speech Recognition and Understanding

The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '19: 2019 International Conference on Multimodal Interaction

October 2019

601 pages

ISBN:9781450368605

DOI:10.1145/3340555

Editors:
Wen Gao
Peking University, China
,
Helen Mei Ling Meng
Chinese University of Hong Kong, China
,
Matthew Turk
Toyota Technological Institute at Chicago, USA
,
Susan R. Fussell
Cornell University, USA
,
Björn Schuller
Imperial College London / University of Augsburg, UK
,
Yale Song
Microsoft Research, USA
,
Kai Yu
Shanghai Jiao Tong University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

China NSFC projects
National Key Research and Development Program of China

Conference

ICMI '19

ICMI '19: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 14 - 18, 2019

Suzhou, China

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang CDai SWang YYang FQiu MChen KZhou WHuang J(2022)ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language UnderstandingIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.315326830(1207-1218)Online publication date: 2022
https://doi.org/10.1109/TASLP.2022.3153268
Li SDang JWang L(2021)Spoken Language Understanding with Sememe Knowledge as Domain Knowledge2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP49672.2021.9362087(1-5)Online publication date: 24-Jan-2021
https://doi.org/10.1109/ISCSLP49672.2021.9362087
Liu CZhu SChen LYu K(2020)Robust Spoken Language Understanding with RL-Based Value Error RecoveryNatural Language Processing and Chinese Computing10.1007/978-3-030-60450-9_7(78-90)Online publication date: 14-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60450-9_7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten