Combining Several ASR Outputs in a Graph-Based SLU System

Marcos Calvo¹⁵,
Lluís-F. Hurtado¹⁵,
Fernando García¹⁵ &
…
Emilio Sanchis¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9423))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2387 Accesses
1 Altmetric

Abstract

In this paper, we present an approach to Spoken Language Understanding (SLU) where we perform a combination of multiple hypotheses from several Automatic Speech Recognizers (ASRs) in order to reduce the impact of recognition errors in the SLU module. This combination is performed using a Grammatical Inference algorithm that provides a generalization of the input sentences by means of a weighted graph of words. We have also developed a specific SLU algorithm that is able to process these graphs of words according to a stochastic semantic modelling.The results show that the combinations of several hypotheses from the ASR module outperform the results obtained by taking just the 1-best transcription.

This work is partially supported by the Spanish MEC under contract TIN2014-54288-C4-3-R and FPU Grant AP2010-4193

Download to read the full chapter text

Chapter PDF

Exploiting Multiple ASR Outputs for a Spoken Language Understanding Task

Hierarchical Models for Rescoring Graphs vs. Full Integration

On the Use of Phoneme Lattices in Spoken Language Understanding

Keywords

References

Bangalore, S., Bordel, G., Riccardi, G.: Computing consensus translation from multiple machine translation systems. In: ASRU, pp. 351–354 (2001)
Google Scholar
Benedí, J.M., Lleida, E., Varona, A., Castro, M.J., Galiano, I., Justo, R., de Letona, I.L., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: LREC, pp. 1636–1639 (2006)
Google Scholar
Bonneau-Maynard, H., Lefèvre, F.: Investigating stochastic speech understanding. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 260–263 (2001)
Google Scholar
Calvo, M., García, F., Hurtado, L.F., Jiménez, S., Sanchis, E.: Exploiting multiple hypotheses for multilingual spoken language understanding. In: CoNLL, pp. 193–201 (2013)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–354 (1997)
Google Scholar
Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing 6(99), 1569–1583 (2010)
Google Scholar
Hakkani-Tür, D., Béchet, F., Riccardi, G., Tür, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech & Language 20(4), 495–514 (2006)
Article Google Scholar
He, Y., Young, S.: Spoken language understanding using the hidden vector state model. Speech Communication 48, 262–275 (2006)
Article Google Scholar
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: ClustalW and ClustalX version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Article Google Scholar
Segarra, E., Sanchis, E., Galiano, M., García, F., Hurtado, L.: Extracting Semantic Information Through Automatic Learning Techniques. IJPRAI 16(3), 301–307 (2002)
Google Scholar
Tür, G., Deoras, A., Hakkani-Tür, D.: Semantic parsing using word confusion networks with conditional random fields. In: INTERSPEECH (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Valencia, Spain
Marcos Calvo, Lluís-F. Hurtado, Fernando García & Emilio Sanchis

Authors

Marcos Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Lluís-F. Hurtado
View author publications
You can also search for this author in PubMed Google Scholar
Fernando García
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Sanchis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando García .

Editor information

Editors and Affiliations

Univ. Católica del Uruguay, Montevideo, Uruguay
Alvaro Pardo
University of Surrey, Guildford, United Kingdom
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calvo, M., Hurtado, LF., García, F., Sanchis, E. (2015). Combining Several ASR Outputs in a Graph-Based SLU System. In: Pardo, A., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2015. Lecture Notes in Computer Science(), vol 9423. Springer, Cham. https://doi.org/10.1007/978-3-319-25751-8_66

Download citation

DOI: https://doi.org/10.1007/978-3-319-25751-8_66
Published: 25 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25750-1
Online ISBN: 978-3-319-25751-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Combining Several ASR Outputs in a Graph-Based SLU System

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting Multiple ASR Outputs for a Spoken Language Understanding Task

Hierarchical Models for Rescoring Graphs vs. Full Integration

On the Use of Phoneme Lattices in Spoken Language Understanding

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Combining Several ASR Outputs in a Graph-Based SLU System

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting Multiple ASR Outputs for a Spoken Language Understanding Task

Hierarchical Models for Rescoring Graphs vs. Full Integration

On the Use of Phoneme Lattices in Spoken Language Understanding

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation