research-article

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Authors:

Yuexian ZouAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 326 - 336

https://doi.org/10.1145/3583780.3615093

Published: 21 October 2023 Publication History

Abstract

Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieve the high performance, most of them still overlook the inherent relationships between intents and slots, and fail to achieve mutual guidance between the two subtasks. To solve the problem, we propose a multi-level multi-grained SLU framework MMCL to apply contrastive learning at three levels, including utterance level, slot level, and word level to enable intent and slot to mutually guide each other. For the utterance level, our framework implements coarse granularity contrastive learning and fine granularity contrastive learning simultaneously. Besides, we also apply the self-distillation method to improve the robustness of the model. Experimental results and further analysis demonstrate that our proposed model achieves new state-of-the-art results on two public multi-intent SLU datasets, obtaining a 2.6 overall accuracy improvement on MixATIS dataset compared to previous best models.

References

[1]

Mikel Artetxe and Holger Schwenk. 2019. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3197--3203.

[2]

Jane Bromley, Isabelle Guyon, Yann LeCun, and et al. 1993. Signature verification using a "Siamese" time delay neural network. Advances in Neural Information Processing Systems (1993).

[3]

Ya-Hsin Chang and Yun-Nung Chen. 2022. Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding. In Proc. Interspeech 2022. 3458--3462. https://doi.org/10.21437/Interspeech.2022--781

[4]

Lisong Chen, Peilin Zhou, and Yuexian Zou. 2022. Joint Multiple Intent Detection and Slot Filling Via Self-Distillation. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7612--7616.

[5]

Xinlei Chen and Kaiming He. 2020. Improved baselines with momentum contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702--703.

[6]

Yun-Nung Chen, Dilek Hakanni-Tür, Gokhan Tur, Asli Celikyilmaz, Jianfeng Guo, and Li Deng. 2016. Syntax or semantics? knowledge-guided joint semantic frame parsing. In 2016 IEEE Spoken Language Technology Workshop (SLT). IEEE, 348--355.

[7]

Lizhi Cheng, Weijia Jia, and Wenmian Yang. 2021a. An Effective Non-Autoregressive Model for Spoken Language Understanding. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management.

Digital Library

[8]

Lizhi Cheng, Weijia Jia, and Wenmian Yang. 2022a. Capture Salient Historical Information: A Fast and Accurate Non-Autoregressive Model for Multi-Turn Spoken Language Understanding. ACM Transactions on Information Systems (2022). https://doi.org/10.1145/3545800

Digital Library

[9]

Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2021b. A Result based Portable Framework for Spoken Language Understanding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE.

[10]

Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2021c. A Result based Portable Framework for Spoken Language Understanding. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[11]

Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2022b. A Scope Sensitive and Result Attentive Model for Multi-Intent Spoken Language Understanding. arXiv preprint arXiv:2211.12220 (2022).

[12]

Michael Collins, Philipp Koehn, and Ivona Kuvc erová. 2005. Clause restructuring for statistical machine translation. In Proc. of ACL.

Digital Library

[13]

Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, et al. 2018. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190 (2018).

[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.

[15]

Rashmi Gangadharaiah and Balakrishnan Narayanaswamy. 2019. Joint multiple intent detection and slot labeling for goal-oriented dialog. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 564--569.

[16]

Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and Yun-Nung Chen. 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 753--757.

[17]

Daniel Guo, Gokhan Tur, Wen-tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 554--559.

[18]

E Haihong, Peiqing Niu, Zhongfu Chen, and Meina Song. 2019. A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5467--5471.

[19]

Dilek Hakkani-Tür, Gökhan Tür, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, and Ye-Yi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In Interspeech. 715--719.

[20]

Charles T Hemphill, John J Godfrey, and George R Doddington. 1990. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24--27, 1990.

Digital Library

[21]

Byeongchang Kim, Seonghan Ryu, and Gary Geunbae Lee. 2017. Two-stage multi-intent detection for spoken language understanding. Multimedia Tools and Applications, Vol. 76, 9 (2017), 11377--11390.

Digital Library

[22]

Shining Liang, Linjun Shou, Jian Pei, Ming Gong, Wanli Zuo, Xianglin Zuo, and Daxin Jiang. 2022. Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding. arXiv preprint arXiv:2205.03656 (2022).

[23]

Bing Liu and Ian Lane. 2016. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. Interspeech 2016 (2016), 685--689.

[24]

Yijin Liu, Fandong Meng, Jinchao Zhang, Jie Zhou, Yufeng Chen, and Jinan Xu. 2019a. CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1051--1060.

[25]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[26]

Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2078--2087.

[27]

Libo Qin, Qiguang Chen, Tianbao Xie, Qixin Li, Jian-Guang Lou, Wanxiang Che, and Min-Yen Kan. 2022. GL-CLeF: A Global--Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding. In Proc. of ACL.

[28]

Libo Qin, Tailu Liu, Wanxiang Che, Bingbing Kang, Sendong Zhao, and Ting Liu. 2021a. A co-interactive transformer for joint slot filling and intent detection. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8193--8197.

[29]

Libo Qin, Fuxuan Wei, Tianbao Xie, Xiao Xu, Wanxiang Che, and Ting Liu. 2021b. GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling. In Proc. of ACL.

[30]

Libo Qin, Xiao Xu, Wanxiang Che, and Ting Liu. 2020. AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1807--1816.

[31]

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning. PMLR, 5628--5637.

[32]

Zhihao Shen, Zhihao Zhang, Hao Zhou, Guodong Long, and Jing Jiang. 2020. Unsupervised Text Generation via Contrastive Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9811--9818.

[33]

Mengxiao Song, Quangang Li Bowen Yu, Yubin Wang, Tingwen Liu, and Hongbo Xu. 2022. Enhancing Joint Multiple Intent Detection and Slot Filling with Global Intent-Slot Co-occurrence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 7967--7977.

[34]

Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. 2018. Learning universal sentence representations with limited supervision. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1223--1234.

[35]

Gokhan Tur and Renato De Mori. 2011. Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[37]

Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

[38]

Yu Wang, Yilin Shen, and Hongxia Jin. 2018. A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling. In Proc. of NAACL.

[39]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. 38--45.

[40]

Di Wu, Liang Ding, Fan Lu, and Jian Xie. 2020. SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1932--1937.

[41]

Bowen Xing and Ivor W Tsang. 2022a. Co-guiding Net: Achieving Mutual Guidances between Multiple Intent Detection and Slot Filling via Heterogeneous Semantics-Label Graphs. arXiv preprint arXiv:2210.10375 (2022).

[42]

Bowen Xing and Ivor W Tsang. 2022b. Group is better than individual: Exploiting Label Topologies and Label Relations for Joint Multiple Intent Detection and Slot Filling. arXiv preprint arXiv:2210.10369 (2022).

[43]

Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular crf for joint intent detection and slot filling. In 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 78--83.

[44]

Wenmian Yang, Weijia Jia, XIaojie Zhou, and Yutao Luo. 2019. Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 4085--4091.

[45]

Steve Young, Milica Gavs ić, Blaise Thomson, and Jason D Williams. 2013. Pomdp-based statistical spoken dialog systems: A review. Proc. IEEE, Vol. 101, 5 (2013), 1160--1179.

[46]

Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S Yu. 2019. Joint Slot Filling and Intent Detection via Capsule Neural Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5259--5267.

[47]

Xiaodong Zhang and Houfeng Wang. 2016. A joint model of intent determination and slot filling for spoken language understanding. In IJCAI. 2993--2999.

Cited By

Chen MZhang XZhang JLi QLiu TSerra ESpezzano F(2024)Empowering LLMs for Multi-Page Layout Generation via Consistency-Oriented In-Context LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679908(3679-3683)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679908

Index Terms

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics

Recommendations

Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a ...
Learning Dialogue History for Spoken Language Understanding
Natural Language Processing and Chinese Computing
Abstract
In task-oriented dialogue systems, spoken language understanding (SLU) aims to convert users’ queries expressed by natural language to structured representations. SLU usually consists of two parts, namely intent identification and slot filling. ...
Salience modeling based on non-verbal modalities for spoken language understanding
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Previous studies have shown that, in multimodal conversational systems, fusing information from multiple modalities together can improve the overall input interpretation through mutual disambiguation. Inspired by these findings, this paper investigates ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
228
Total Downloads

Downloads (Last 12 months)185
Downloads (Last 6 weeks)20

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen MZhang XZhang JLi QLiu TSerra ESpezzano F(2024)Empowering LLMs for Multi-Page Layout Generation via Consistency-Oriented In-Context LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679908(3679-3683)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679908

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents