Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583780.3615093acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Published: 21 October 2023 Publication History

Abstract

Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieve the high performance, most of them still overlook the inherent relationships between intents and slots, and fail to achieve mutual guidance between the two subtasks. To solve the problem, we propose a multi-level multi-grained SLU framework MMCL to apply contrastive learning at three levels, including utterance level, slot level, and word level to enable intent and slot to mutually guide each other. For the utterance level, our framework implements coarse granularity contrastive learning and fine granularity contrastive learning simultaneously. Besides, we also apply the self-distillation method to improve the robustness of the model. Experimental results and further analysis demonstrate that our proposed model achieves new state-of-the-art results on two public multi-intent SLU datasets, obtaining a 2.6 overall accuracy improvement on MixATIS dataset compared to previous best models.

References

[1]
Mikel Artetxe and Holger Schwenk. 2019. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3197--3203.
[2]
Jane Bromley, Isabelle Guyon, Yann LeCun, and et al. 1993. Signature verification using a "Siamese" time delay neural network. Advances in Neural Information Processing Systems (1993).
[3]
Ya-Hsin Chang and Yun-Nung Chen. 2022. Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding. In Proc. Interspeech 2022. 3458--3462. https://doi.org/10.21437/Interspeech.2022--781
[4]
Lisong Chen, Peilin Zhou, and Yuexian Zou. 2022. Joint Multiple Intent Detection and Slot Filling Via Self-Distillation. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7612--7616.
[5]
Xinlei Chen and Kaiming He. 2020. Improved baselines with momentum contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702--703.
[6]
Yun-Nung Chen, Dilek Hakanni-Tür, Gokhan Tur, Asli Celikyilmaz, Jianfeng Guo, and Li Deng. 2016. Syntax or semantics? knowledge-guided joint semantic frame parsing. In 2016 IEEE Spoken Language Technology Workshop (SLT). IEEE, 348--355.
[7]
Lizhi Cheng, Weijia Jia, and Wenmian Yang. 2021a. An Effective Non-Autoregressive Model for Spoken Language Understanding. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management.
[8]
Lizhi Cheng, Weijia Jia, and Wenmian Yang. 2022a. Capture Salient Historical Information: A Fast and Accurate Non-Autoregressive Model for Multi-Turn Spoken Language Understanding. ACM Transactions on Information Systems (2022). https://doi.org/10.1145/3545800
[9]
Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2021b. A Result based Portable Framework for Spoken Language Understanding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE.
[10]
Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2021c. A Result based Portable Framework for Spoken Language Understanding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[11]
Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2022b. A Scope Sensitive and Result Attentive Model for Multi-Intent Spoken Language Understanding. arXiv preprint arXiv:2211.12220 (2022).
[12]
Michael Collins, Philipp Koehn, and Ivona Kuvc erová. 2005. Clause restructuring for statistical machine translation. In Proc. of ACL.
[13]
Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, et al. 2018. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190 (2018).
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.
[15]
Rashmi Gangadharaiah and Balakrishnan Narayanaswamy. 2019. Joint multiple intent detection and slot labeling for goal-oriented dialog. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 564--569.
[16]
Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and Yun-Nung Chen. 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 753--757.
[17]
Daniel Guo, Gokhan Tur, Wen-tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 554--559.
[18]
E Haihong, Peiqing Niu, Zhongfu Chen, and Meina Song. 2019. A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5467--5471.
[19]
Dilek Hakkani-Tür, Gökhan Tür, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, and Ye-Yi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In Interspeech. 715--719.
[20]
Charles T Hemphill, John J Godfrey, and George R Doddington. 1990. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24--27, 1990.
[21]
Byeongchang Kim, Seonghan Ryu, and Gary Geunbae Lee. 2017. Two-stage multi-intent detection for spoken language understanding. Multimedia Tools and Applications, Vol. 76, 9 (2017), 11377--11390.
[22]
Shining Liang, Linjun Shou, Jian Pei, Ming Gong, Wanli Zuo, Xianglin Zuo, and Daxin Jiang. 2022. Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding. arXiv preprint arXiv:2205.03656 (2022).
[23]
Bing Liu and Ian Lane. 2016. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. Interspeech 2016 (2016), 685--689.
[24]
Yijin Liu, Fandong Meng, Jinchao Zhang, Jie Zhou, Yufeng Chen, and Jinan Xu. 2019a. CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1051--1060.
[25]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[26]
Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2078--2087.
[27]
Libo Qin, Qiguang Chen, Tianbao Xie, Qixin Li, Jian-Guang Lou, Wanxiang Che, and Min-Yen Kan. 2022. GL-CLeF: A Global--Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding. In Proc. of ACL.
[28]
Libo Qin, Tailu Liu, Wanxiang Che, Bingbing Kang, Sendong Zhao, and Ting Liu. 2021a. A co-interactive transformer for joint slot filling and intent detection. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8193--8197.
[29]
Libo Qin, Fuxuan Wei, Tianbao Xie, Xiao Xu, Wanxiang Che, and Ting Liu. 2021b. GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling. In Proc. of ACL.
[30]
Libo Qin, Xiao Xu, Wanxiang Che, and Ting Liu. 2020. AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1807--1816.
[31]
Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning. PMLR, 5628--5637.
[32]
Zhihao Shen, Zhihao Zhang, Hao Zhou, Guodong Long, and Jing Jiang. 2020. Unsupervised Text Generation via Contrastive Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9811--9818.
[33]
Mengxiao Song, Quangang Li Bowen Yu, Yubin Wang, Tingwen Liu, and Hongbo Xu. 2022. Enhancing Joint Multiple Intent Detection and Slot Filling with Global Intent-Slot Co-occurrence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 7967--7977.
[34]
Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. 2018. Learning universal sentence representations with limited supervision. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1223--1234.
[35]
Gokhan Tur and Renato De Mori. 2011. Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[37]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[38]
Yu Wang, Yilin Shen, and Hongxia Jin. 2018. A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling. In Proc. of NAACL.
[39]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. 38--45.
[40]
Di Wu, Liang Ding, Fan Lu, and Jian Xie. 2020. SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1932--1937.
[41]
Bowen Xing and Ivor W Tsang. 2022a. Co-guiding Net: Achieving Mutual Guidances between Multiple Intent Detection and Slot Filling via Heterogeneous Semantics-Label Graphs. arXiv preprint arXiv:2210.10375 (2022).
[42]
Bowen Xing and Ivor W Tsang. 2022b. Group is better than individual: Exploiting Label Topologies and Label Relations for Joint Multiple Intent Detection and Slot Filling. arXiv preprint arXiv:2210.10369 (2022).
[43]
Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular crf for joint intent detection and slot filling. In 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 78--83.
[44]
Wenmian Yang, Weijia Jia, XIaojie Zhou, and Yutao Luo. 2019. Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 4085--4091.
[45]
Steve Young, Milica Gavs ić, Blaise Thomson, and Jason D Williams. 2013. Pomdp-based statistical spoken dialog systems: A review. Proc. IEEE, Vol. 101, 5 (2013), 1160--1179.
[46]
Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S Yu. 2019. Joint Slot Filling and Intent Detection via Capsule Neural Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5259--5267.
[47]
Xiaodong Zhang and Houfeng Wang. 2016. A joint model of intent determination and slot filling for spoken language understanding. In IJCAI. 2993--2999.

Cited By

View all
  • (2024)Empowering LLMs for Multi-Page Layout Generation via Consistency-Oriented In-Context LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679908(3679-3683)Online publication date: 21-Oct-2024

Index Terms

  1. Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contrastive learning
    2. multi-grained
    3. multi-level
    4. self-distillation
    5. spoken language understanding

    Qualifiers

    • Research-article

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)185
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Empowering LLMs for Multi-Page Layout Generation via Consistency-Oriented In-Context LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679908(3679-3683)Online publication date: 21-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media